On 01/07/2015 05:42 PM, Erick Erickson wrote:
True, and you can do this if you take explicit control of the document
routing, but...
that's quite tricky. You forever after have to send any _updates_ to the same
shard you did the first time, whereas SPLITSHARD will "do the right thing".

Hmm. That is a good point. I wonder if there's some kind of middle ground here? Something that lets me send an update (or new document) to an arbitrary node/shard but which is still routed according to my specific requirements? Maybe this can already be achieved by messing with the routing?

<snip> there are some components that don't do the right thing in
distributed mode, joins for instance. The list is actually quite small and
is getting smaller all the time.


That's fine. We have a lot of query (pre-)processing outside of Solr. It's no problem for us to send a couple of queries to a couple of shards and aggregate the result ourselves. It would, of course, be nice if everything worked in distributed mode, but at least for us it's not an issue. This is a side effect of our complex reporting requirements -- we do aggregation, filtering and other magic on data that is partially in Solr and partially elsewhere.

Not true if the other shards have had any indexing activity. The commit is
usually forwarded to all shards. If the individual index on a
particular shard is
unchanged then it should be a no-op though.

I think a no-op commit no longer clears the caches either, so that's great.

But the usage pattern here is its own bit of a trap. If all your
indexing is going
to a single shard, then also the entire indexing _load_ is happening on that
shard. So the CPU utilization will be higher on that shard than the older ones.
Since distributed requests need to get a response from every shard before
returning to the client, the response time will be bounded by the response from
the slowest shard and this may actually be slower. Probably only noticeable
when the CPU is maxed anyway though.

This is a very good point. But I don't think SPLITSHARD is the magical answer here. If you have N shards on N boxes, and they are all getting nearly "full" and you decide to split one and move half to a new box, you'll end up with N-2 nearly full boxes and 2 half-full boxes. What happens if the disks fill up further? Do I have to split each shard? That sounds pretty nightmareish!

 - Bram

Reply via email to