First of all, unless you have a lot of shards worrying about which one is the leader is not worth the effort. That code was put in there to deal with a situation where there were 100s of shards and when the system was cold-started they all could have their leader be on the same node.
The extra work a leader does is actually quite minimal, so unless you have a lot of leaders. I wouldn’t start to worry until on the order of 20-30, then I’d measure to be sure. And the extra work is during indexing when it has to distribute the updates to followers, FWIW. But to your question, I have no idea. I’d say “look at the logs”, but you’ve already done that. What happens is the preferred leader gets inserted in the overseer_election queue watching the current leader, then the current leader is moved to the end of the election queue. This _should_ trigger the watch on the preferred leader to take over. I wouldn’t necessarily expect error messages in the logs BTW, you’d need to look at the INFO level messages for both the PreferredLeader, Overseer and current leader in that order. The other place that’d be interesting is where the preferred leader is in the leader election queue for that shard after it’s all done. It actually shouldn’t be in the election queue at all on success. Not much help I know. The code is in RebalanceLeaders.java along with some explanatory notes. Best, Erick > On Jun 23, 2020, at 3:43 AM, Karl Stoney > <karl.sto...@autotrader.co.uk.INVALID> wrote: > > Hey, > We have a SolrCloud collection with 8 replicas, and one of those replicas has > the `property.preferredleader: true` set. However when we perform a > `REBALANCELEADERS` we get: > > ``` > { > "responseHeader": { > "status": 0, > "QTime": 62268 > }, > "Summary": { > "Failure": "Not all active replicas with preferredLeader property are > leaders" > }, > "failures": { > "shard1": { > "status": "failed", > "msg": "Could not change leder for slice shard1 to core_node9" > } > } > } > ``` > > There is nothing in the solr logs on any of the nodes to indicate the reason > for the failure. > > What I have noticed is that 4 of the nodes briefly go orange in the gui (eg > “down”), and for a moment 9 of them go into yellow (eg “recovering”), before > all becoming active again with the same (incorrect) leader. > > We use the same model on 4 other collections to set the preferred leader to a > particular replica and they all work fine. > > Does anyone have any ideas? > > Thanks > Karl > Unless expressly stated otherwise in this email, this e-mail is sent on > behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, > Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto > Trader Limited is part of the Auto Trader Group Plc group. This email and any > files transmitted with it are confidential and may be legally privileged, and > intended solely for the use of the individual or entity to whom they are > addressed. If you have received this email in error please notify the sender. > This email message has been swept for the presence of computer viruses.