[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927949#action_12927949
 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Even though the logs do not make a lot of sense for me at this point, I was 
thinking that your scenario is not supposed to work given our guarantees. Let's 
look at an example.

Suppose we have 3 servers: A, B, and  C. Suppose that C is initially the leader 
and proposes operations that B is able to ack, but A doesn't. Now, suppose that 
I come and replace C with a fresh server, same id but empty state, and I do it 
before A and B are able to elect a new leader and recover. In this case, A and 
C may form a quorum and the state of the ZooKeeper ensemble would be empty. The 
replacement of server C with a fresh server violates our assumptions. 

It should work, though, if you add a fresh server with a working ensemble. That 
is, you let A and B elect a new leader, and then you start the new C server. In 
your case, I'm still not sure why it happens because the initial zxid of node 1 
is 4294967742 according to your excerpt. 

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never 
> cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 
> (automated startup). The new node had not participated in any zookeeper 
> quorum previously. The node 192.148.130.11 was permanently removed from 
> service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers 
> to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that 
> it had not seen the latest zxid.
> This particular problem has not been verified with later versions of 
> zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to