[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843295#action_12843295
 ] 

Henry Robinson commented on ZOOKEEPER-684:
------------------------------------------

Flavio - 

Looking at this excerpt from round one: 

2010-03-03 04:56:54,630 - INFO  [Thread-0:leaderelect...@155] - Server address: 
0.0.0.0/0.0.0.0:11223
2010-03-03 04:56:54,631 - INFO  [Thread-0:leaderelect...@155] - Server address: 
0.0.0.0/0.0.0.0:11225
2010-03-03 04:56:54,630 - INFO  [Thread-1:leaderelect...@155] - Server address: 
0.0.0.0/0.0.0.0:11221
2010-03-03 04:56:54,632 - INFO  [Thread-0:leaderelect...@103] - Election tally: 
2010-03-03 04:56:54,632 - INFO  [Thread-0:leaderelect...@109] - 0       -> 1
2010-03-03 04:56:54,632 - INFO  [Thread-0:leaderelect...@109] - 2       -> 1
2010-03-03 04:56:54,632 - INFO  [Thread-0:leaderelect...@109] - 1       -> 1
2010-03-03 04:56:54,633 - INFO  [Thread-1:leaderelect...@155] - Server address: 
0.0.0.0/0.0.0.0:11223
2010-03-03 04:56:54,633 - INFO  [Thread-1:leaderelect...@155] - Server address: 
0.0.0.0/0.0.0.0:11225
2010-03-03 04:56:54,633 - INFO  [Thread-1:leaderelect...@103] - Election tally: 
2010-03-03 04:56:54,634 - INFO  [Thread-1:leaderelect...@109] - 2       -> 2
2010-03-03 04:56:54,634 - INFO  [Thread-1:leaderelect...@109] - 1       -> 1
2010-03-03 04:56:54,634 - INFO  [Thread-1:leaderelect...@218] - Found leader: 
my type is: PARTICIPANT

Thread 1 has received 2 votes for server 2 as the leader. It then exits, and 
this is the problem, I think. As a result, Thread 0 can never get a quorum. 

Therefore my thought was to put a barrier after all servers have done one 
round, but not updated their results. Unfortunately, it looks like line 212 in 
LeaderElection.java upsets this plan as the thread's current vote is set here. 

So Thread 0 gets one vote for every server. It sets its vote to 2 and then 
replies to Thread 1 which therefore sees two votes for server 2. What we want 
to happen is Thread 0 gets one vote for every server, but replies to Thread 1 
before it does so. I'm unclear on an easy way to mock up this interleaving - 
the LeaderElection code is hard to stub out. 

> Race in LENonTerminateTest
> --------------------------
>
>                 Key: ZOOKEEPER-684
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-684
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: zookeeper-684-test-failure.rtf, ZOOKEEPER-684.patch
>
>
> testNonTermination failed during a Hudson run for ZOOKEEPER-59. After 
> inspecting the output, it looks like server is electing 2 as a leader and 
> leaving. Given that 2 is just a mock server, server 0 remains alone in leader 
> election.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to