[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927949#action_12927949 ]
Flavio Junqueira commented on ZOOKEEPER-917: -------------------------------------------- Even though the logs do not make a lot of sense for me at this point, I was thinking that your scenario is not supposed to work given our guarantees. Let's look at an example. Suppose we have 3 servers: A, B, and C. Suppose that C is initially the leader and proposes operations that B is able to ack, but A doesn't. Now, suppose that I come and replace C with a fresh server, same id but empty state, and I do it before A and B are able to elect a new leader and recover. In this case, A and C may form a quorum and the state of the ZooKeeper ensemble would be empty. The replacement of server C with a fresh server violates our assumptions. It should work, though, if you add a fresh server with a working ensemble. That is, you let A and B elect a new leader, and then you start the new C server. In your case, I'm still not sure why it happens because the initial zxid of node 1 is 4294967742 according to your excerpt. > Leader election selected incorrect leader > ----------------------------------------- > > Key: ZOOKEEPER-917 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection, server > Affects Versions: 3.2.2 > Environment: Cloudera distribution of zookeeper (patched to never > cache DNS entries) > Debian lenny > Reporter: Alexandre Hardy > Priority: Critical > Fix For: 3.3.3, 3.4.0 > > Attachments: zklogs-20101102144159SAST.tar.gz > > > We had three nodes running zookeeper: > * 192.168.130.10 > * 192.168.130.11 > * 192.168.130.14 > 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 > (automated startup). The new node had not participated in any zookeeper > quorum previously. The node 22.214.171.124 was permanently removed from > service and could not contribute to the quorum any further (powered off). > DNS entries were updated for the new node to allow all the zookeeper servers > to find the new node. > The new node 192.168.130.13 was selected as the LEADER, despite the fact that > it had not seen the latest zxid. > This particular problem has not been verified with later versions of > zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.