Daniel Collins wrote
> Is it important where your leader is?  If you just want to minimize
> leadership changes during rolling re-start, then you could restart in the
> opposite order (S3, S2, S1).  That would give only 1 transition, but the
> end result would be a leader on S2 instead of S1 (not sure if that
> important to you or not).  I know its not a "fix", but it might be a
> workaround until the whole leadership moving is done?

I think that rolling restarting the machines in the opposite order
(S3,S2,S1) will result in S3 being the leader. It's a valid approach but
shouldn't I have to revert to the original order (S1,S2,S3) to achieve the
same result in the following rolling restart? This includes operational
costs and complexity that I want to avoid.


Erick Erickson wrote
>> Just skimming, but the problem here that I ran into was with the
>> listeners. Each _Solr_ instance out there is listening to one of the
>> ephemeral nodes (the "one in front"). So deleting a node does _not_
>> change which ephemeral node the associated Solr instance is listening
>> to.
>>
>> So, for instance, when you delete S2..n-000001 and re-add it, S2 is
>> still looking at S1....n-000000 and will continue looking at
>> S1...n-000000 until S1....n-000000 is deleted.
>>
>> Deleting S2..n-000001 will wake up S3 though, which should now be
>> looking at S1....n-0000000. Now you have two Solr listeners looking at
>> the same ephemeral node. The key is that deleting S2...n-000001 does
>> _not_ wake up S2, just any solr instance that has a watch on the
>> associated ephemeral node.

Thanks for the info Erick. I wasn't aware of this "linked-list" listeners
structure between the zk nodes. Based on what you've said though I've
changed my implementation a bit and it seems to be working at first glance.
Of course it's not reliable yet but it looks promising.

My original attempt
> S1:-n_0000000000 (no code running here)
> S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating
> -n_0000000004)
> S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating
> -n_0000000003) 

has been changed to 
S1:-n_0000000000 (no code running here)
S2:-n_0000000003 (code deleting zknode -n_0000000001 and creating
-n_0000000003 using EPHEMERAL_SEQUENTIAL)
S3:-n_0000000002 (no code running here) 

Once S1 is shutdown S3 becomes leader since it listens to S1 now according
to what you've said

The original reason I pursued this "minimize leadership changes" quest was
that it _could_ lead to "data loss" in some scenarios. I'm not entirely sure
though and you could correct me on this and but I'm explaining myself.

If you have incoming indexing requests during a rolling restart, could there
be a case during the "current leader shutdown" where the "leader-to-be-node"
could not have the time to sync with the
"current-leader-that-shut-downs-node" in which case everyone will now sync
to the new leader thus missing some updates. I've seen an installation
having different index sizes in each replica that deteriorated over time.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973p4179147.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to