Vishal K commented on ZOOKEEPER-917:
Sorry for not making much progress on
(http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership). I have spent some
time to understand the code. But It is a bit difficult to focus on development
without dedicated development time. I am pushing to get dedicated development
time at work for this so that I don't have to rely on my spare time.
Few questions related to your comments:
1. Can you please elaborate on : "At the same time, a server A decides to
follow another server B if it receives a message from B saying that B is
leading and from a quorum saying that they are following, even if A is in a
later election epoch. This mechanism is there to avoid A being locked out of
the ensemble in the case it partitions away and comes back later."
2. Why is it not OK for B to give up leadership when it sees that its
<epoch,zxid> is lower than others?
> Leader election selected incorrect leader
> Key: ZOOKEEPER-917
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
> Project: Zookeeper
> Issue Type: Bug
> Components: leaderElection, server
> Affects Versions: 3.2.2
> Environment: Cloudera distribution of zookeeper (patched to never
> cache DNS entries)
> Debian lenny
> Reporter: Alexandre Hardy
> Priority: Critical
> Fix For: 3.3.3, 3.4.0
> Attachments: zklogs-20101102144159SAST.tar.gz
> We had three nodes running zookeeper:
> * 192.168.130.10
> * 192.168.130.11
> * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13
> (automated startup). The new node had not participated in any zookeeper
> quorum previously. The node 22.214.171.124 was permanently removed from
> service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers
> to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that
> it had not seen the latest zxid.
> This particular problem has not been verified with later versions of
> zookeeper, and no attempt has been made to reproduce this problem as yet.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.