As an update, it looks like the heavy load is in part because the node never "catches back up" with the other nodes. In SolrCloud UI it was yellow for a long time, then eventually grey, then back to yellow and orange. It never recovers as green.
I should note this collection is very busy, indexing 5k+ small documents per second, but the nodes were all fine until I had to restart them and they had to re-sync. Here is the log since reboot: https://gist.github.com/396af4b217ce8f536db6 Any ideas? On Sat, Feb 2, 2013 at 10:27 AM, Brett Hoerner <br...@bretthoerner.com>wrote: > Hi, > > I have a 5 server cluster running 1 collection with 20 shards, replication > factor of 2. > > Earlier this week I had to do a rolling restart across the cluster, this > worked great and the cluster stayed up the whole time. The problem is that > the last node I restarted is now the leader of 0 shards, and is just > holding replicas. > > I've noticed this node has abnormally high load average, while the other > nodes (who have the same number of shards, but more leaders on average) are > fine. > > First, I'm wondering if that loud could be related to being a 5x replica > and 0x leader? > > Second, I was wondering if I could somehow flag single shards to re-elect > a leader (or force a leader) so that I could more evenly distribute how > many leader shards each physical server has running? > > Thanks. >