Flavio Junqueira commented on ZOOKEEPER-822:

Hi VIshal, Good catches:

1- It sounds right that blocking the connection establishment might increase 
the time to election unnecessarily when the other party is not up. Here is my 
interpretation. If the machine is up but the the zk server is not running, then 
we simply get a connection failure and move on. The same doesn't happen when 
the the machine is down, since we need to wait for the connection establishment 
to time out;
2- It sounds right that a connection can be dropped erroneously due to a race, 
but I don't see in which case it can cause the election time to increase 
substantially, unless the race is triggered multiple times in a row. A server 
will try to connect upon every new notification, and a server only calls 
SendWorker.finish() in receiveNotification if it has a higher identifier. In 
this case, it creates a new connection immediately after, so it would need a 
previous connection being dropped right before to have the case you're 
3- Servers with higher identifiers decline connection requests from servers 
with lower identifiers; it is part of the protocol. Is this what you're 
referring to?

> Leader election taking a long time  to complete
> -----------------------------------------------
>                 Key: ZOOKEEPER-822
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0
>            Reporter: Vishal K
>            Priority: Blocker
>         Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
> test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz
> Created a 3 node cluster.
> 1 Fail the ZK leader
> 2. Let leader election finish. Restart the leader and let it join the 
> 3. Repeat 
> After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
> Note- we didn't have any ZK clients and no new znodes were created.
> zoo.cfg is shown below:
> #Mon Jul 19 12:15:10 UTC 2010
> server.1=\:2888\:3888
> server.0=\:2888\:3888
> clientPort=2181
> dataDir=/var/zookeeper
> syncLimit=2
> server.2=\:2888\:3888
> initLimit=5
> tickTime=2000
> I have attached logs from two nodes that took a long time to form the cluster 
> after failing the leader. The leader was down anyways so logs from that node 
> shouldn't matter.
> Look for "START HERE". Logs after that point should be of our interest.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to