[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928605#action_12928605 ]
Vishal K commented on ZOOKEEPER-917: ------------------------------------ Hi Flavio, Sorry for not making much progress on (http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership). I have spent some time to understand the code. But It is a bit difficult to focus on development without dedicated development time. I am pushing to get dedicated development time at work for this so that I don't have to rely on my spare time. Few questions related to your comments: 1. Can you please elaborate on : "At the same time, a server A decides to follow another server B if it receives a message from B saying that B is leading and from a quorum saying that they are following, even if A is in a later election epoch. This mechanism is there to avoid A being locked out of the ensemble in the case it partitions away and comes back later." 2. Why is it not OK for B to give up leadership when it sees that its <epoch,zxid> is lower than others? Thanks. > Leader election selected incorrect leader > ----------------------------------------- > > Key: ZOOKEEPER-917 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection, server > Affects Versions: 3.2.2 > Environment: Cloudera distribution of zookeeper (patched to never > cache DNS entries) > Debian lenny > Reporter: Alexandre Hardy > Priority: Critical > Fix For: 3.3.3, 3.4.0 > > Attachments: zklogs-20101102144159SAST.tar.gz > > > We had three nodes running zookeeper: > * 192.168.130.10 > * 192.168.130.11 > * 192.168.130.14 > 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 > (automated startup). The new node had not participated in any zookeeper > quorum previously. The node 192.148.130.11 was permanently removed from > service and could not contribute to the quorum any further (powered off). > DNS entries were updated for the new node to allow all the zookeeper servers > to find the new node. > The new node 192.168.130.13 was selected as the LEADER, despite the fact that > it had not seen the latest zxid. > This particular problem has not been verified with later versions of > zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.