I recall I had some luck fixing a leader-less shard (after a ZK quorum failure) 
by forcably removing the records for the down-state replicas from the leader 
election list, and then forcing an election. 
The ZK path looks like collections/<collection>/leader_elect/shardX/election. 
Usually you’ll find the down-state one that keeps getting elected is the first 
one. Delete that, then try the force-election collections api command again.



On 4/5/16, 3:15 AM, "Tom Evans" <tevans...@googlemail.com> wrote:

>Hi all, I have an 8 node SolrCloud 5.5 cluster with 11 collections,
>most of them in a 1 shard x 8 replicas configuration. We have 5 ZK
>nodes.
>
>During the night, we attempted to reindex one of the larger
>collections. We reindex by pushing json docs to the update handler
>from a number of processes. It seemed this overwhelmed the servers,
>and caused all of the collections to fail and end up in either a down
>or a recovering state, often with no leader.
>
>Restarting and rebooting the servers brought a lot of the collections
>back online, but we are left with a few collections for which all the
>nodes hosting those replicas are up, but the replica reports as either
>"active" or "down", and with no leader.
>
>Trying to force a leader election has no effect, it keeps choosing a
>leader that is in "down" state. Removing all the nodes that are in
>"down" state and forcing a leader election also has no effect.
>
>
>Any ideas? The only viable option I see is to create a new collection,
>index it and then remove the old collection and alias it in.
>
>Cheers
>
>Tom

Reply via email to