Re: How large is your solr index?

Bram Van Dam Thu, 08 Jan 2015 03:38:18 -0800

On 01/07/2015 05:42 PM, Erick Erickson wrote:

True, and you can do this if you take explicit control of the document
routing, but...
that's quite tricky. You forever after have to send any _updates_ to the same
shard you did the first time, whereas SPLITSHARD will "do the right thing".

Hmm. That is a good point. I wonder if there's some kind of middleground here? Something that lets me send an update (or new document) toan arbitrary node/shard but which is still routed according to myspecific requirements? Maybe this can already be achieved by messingwith the routing?

<snip> there are some components that don't do the right thing in
distributed mode, joins for instance. The list is actually quite small and
is getting smaller all the time.

That's fine. We have a lot of query (pre-)processing outside of Solr.It's no problem for us to send a couple of queries to a couple of shardsand aggregate the result ourselves. It would, of course, be nice ifeverything worked in distributed mode, but at least for us it's not anissue. This is a side effect of our complex reporting requirements -- wedo aggregation, filtering and other magic on data that is partially inSolr and partially elsewhere.

Not true if the other shards have had any indexing activity. The commit is
usually forwarded to all shards. If the individual index on a
particular shard is
unchanged then it should be a no-op though.


I think a no-op commit no longer clears the caches either, so that's great.

But the usage pattern here is its own bit of a trap. If all your
indexing is going
to a single shard, then also the entire indexing _load_ is happening on that
shard. So the CPU utilization will be higher on that shard than the older ones.
Since distributed requests need to get a response from every shard before
returning to the client, the response time will be bounded by the response from
the slowest shard and this may actually be slower. Probably only noticeable
when the CPU is maxed anyway though.

This is a very good point. But I don't think SPLITSHARD is the magicalanswer here. If you have N shards on N boxes, and they are all gettingnearly "full" and you decide to split one and move half to a new box,you'll end up with N-2 nearly full boxes and 2 half-full boxes. Whathappens if the disks fill up further? Do I have to split each shard?That sounds pretty nightmareish!


 - Bram

Re: How large is your solr index?

Reply via email to