Flavio Paiva Junqueira commented on ZOOKEEPER-684:

Hi Henry, if I understand correctly your patch, in the case we have an 
execution like the one of the log attached in this issue, then the test would 
fail because the latch would never count down to zero. Is this correct? If so, 
I don't understand how it improves the test.

My understanding of the race is that in the first round, server 0 receives a 
vote from mock server 2, but server 1 does not receive a vote from 2 (udp 
socket times out while waiting to receive). In the second round, server 1 
receives vote from 0 and from 2, both voting for 2, and consequently server 1 
elects 2. I think this is what you observe too in your last comment. If the 
receive call is timing out too soon, don't we have to increase the time out 
value? I understand that this is not desirable because it increases election 
time, but if it the current value is not sufficient, then I don't see a better 

> Race in LENonTerminateTest
> --------------------------
>                 Key: ZOOKEEPER-684
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-684
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>            Priority: Critical
>             Fix For: 3.3.0
>         Attachments: zookeeper-684-test-failure.rtf, ZOOKEEPER-684.patch
> testNonTermination failed during a Hudson run for ZOOKEEPER-59. After 
> inspecting the output, it looks like server is electing 2 as a leader and 
> leaving. Given that 2 is just a mock server, server 0 remains alone in leader 
> election.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to