Vishal K commented on ZOOKEEPER-822:

I have attached new logs. I don't use ntp, but all l the nodes should be at the 
most a few seconds apart. 

I have marked start and end of the faulty election. look at  
zookeeper- and search for "vishal", 

Note - it is super easy to reproduce the bug. Create a 3 node cluster and 
reboot the leader (or shutdown the network interface). You may need to repeat 
the test several times. 

If you do a clean shutdown of the leader (zkServer.sh stop), then you won't see 
this bug. I feel that there is something releated to TCP timeout/ session 
management of failed node that is causing this problem.

> Leader election taking a long time  to complete
> -----------------------------------------------
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, 
> zk_leader_election.tar.gz
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
> Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=\:2888\:3888
> server.0=\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster 
> after failing the leader. The leader was down anyways so logs from that node 
> shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to