[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Alexandre Hardy (JIRA) Wed, 03 Nov 2010 08:59:49 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927888#action_12927888
 ]


Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Hi Flavio,

The three zookeeper servers are zookeeper1, zookeeper2 and zookeeper3.
Initially the servers were
    * 192.168.130.10: zookeeper1
    * 192.168.130.11: zookeeper3
    * 192.168.130.14: zookeeper2

After .11 was removed the servers were:
    * 192.168.130.10: zookeeper1
    * 192.168.130.13: zookeeper3
    * 192.168.130.14: zookeeper2

All other settings were set by hbase:
    * tickTime=2000
    * initLimit=10
    * syncLimit=5  
    * peerport=2888
    * leaderport=3888

zookeeper1 would have node id 0
zookeeper2 would have node id 1
zookeeper3 would have node id 2

I'm not sure what else I can give you concerning the configuration.

I note that in 192.168.130.14 (node id 1) we have 
{noformat}
2010-11-02 09:36:27,988 INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection: New election: 4294967742
2010-11-02 09:36:27,988 INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1, 
4294967742, 2, 1, LOOKING, LOOKING, 1
2010-11-02 09:36:27,988 INFO 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Have smaller server 
identifier, so dropping the connection: (2, 1)
2010-11-02 09:36:27,988 INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote
2010-11-02 09:36:27,989 INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 
1, LOOKING, FOLLOWING, 0
{noformat}
 
I don't think there is much chance of some kind of networking configuration, 
but could that explain what we are seeing?



> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never 
> cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 
> (automated startup). The new node had not participated in any zookeeper 
> quorum previously. The node 192.148.130.11 was permanently removed from 
> service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers 
> to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that 
> it had not seen the latest zxid.
> This particular problem has not been verified with later versions of 
> zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Reply via email to