As an update, it looks like the heavy load is in part because the node
never "catches back up" with the other nodes. In SolrCloud UI it was yellow
for a long time, then eventually grey, then back to yellow and orange. It
never recovers as green.

I should note this collection is very busy, indexing 5k+ small documents
per second, but the nodes were all fine until I had to restart them and
they had to re-sync.

Here is the log since reboot: https://gist.github.com/396af4b217ce8f536db6

Any ideas?


On Sat, Feb 2, 2013 at 10:27 AM, Brett Hoerner <br...@bretthoerner.com>wrote:

> Hi,
>
> I have a 5 server cluster running 1 collection with 20 shards, replication
> factor of 2.
>
> Earlier this week I had to do a rolling restart across the cluster, this
> worked great and the cluster stayed up the whole time. The problem is that
> the last node I restarted is now the leader of 0 shards, and is just
> holding replicas.
>
> I've noticed this node has abnormally high load average, while the other
> nodes (who have the same number of shards, but more leaders on average) are
> fine.
>
> First, I'm wondering if that loud could be related to being a 5x replica
> and 0x leader?
>
> Second, I was wondering if I could somehow flag single shards to re-elect
> a leader (or force a leader) so that I could more evenly distribute how
> many leader shards each physical server has running?
>
> Thanks.
>

Reply via email to