[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836356#action_12836356
 ] 

Hudson commented on ZOOKEEPER-569:
--

Integrated in ZooKeeper-trunk #703 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/703/])
. Failure of elected leader can lead to never-ending leader
election (henry via flavio)


 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, ZOOKEEPER-569.patch, 
 zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-19 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835970#action_12835970
 ] 

Patrick Hunt commented on ZOOKEEPER-569:


Henry, there are two patches, please highlight which one the review should 
review. thx

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, ZOOKEEPER-569.patch, 
 zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836009#action_12836009
 ] 

Henry Robinson commented on ZOOKEEPER-569:
--

The most recent patch I submitted is the right patch - it includes Flavio's 
suggestions. 

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, ZOOKEEPER-569.patch, 
 zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836082#action_12836082
 ] 

Hadoop QA commented on ZOOKEEPER-569:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435629/zookeeper-569.patch
  against trunk revision 912052.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/console

This message is automatically generated.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, ZOOKEEPER-569.patch, 
 zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831203#action_12831203
 ] 

Mahadev konar commented on ZOOKEEPER-569:
-

henry, are you working on a new patch adressing flavio's comments?

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-08 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831213#action_12831213
 ] 

Henry Robinson commented on ZOOKEEPER-569:
--

Yes, hoping to get it out this week.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-06 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830521#action_12830521
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-569:
--

Thanks, Henry, it looks good. I agree with your comment on the confusion 
between LE between instantiated every time it is used, and FLE behaving 
differently. We should really just have one model.

One comment on the patch is that I don't think you need to instantiate 
QuorumCnxManager in mockServer() on the new test. The conditional block that 
checks the listener can also be removed.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829217#action_12829217
 ] 

Benjamin Reed commented on ZOOKEEPER-569:
-

i'm also wondering about the heardFrom == 0. in your case A and B will still be 
up, so heardFrom will not be zero. don't you really want to check whether or 
not you heard from guy that you think is the leader?

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829220#action_12829220
 ] 

Henry Robinson commented on ZOOKEEPER-569:
--

Yes, you're both right! I misread my own notes on the bug :/

I'm writing tests for a *real* fix now. Thanks both for pointing this out. 

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829326#action_12829326
 ] 

Hadoop QA commented on ZOOKEEPER-569:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434729/zookeeper-569.patch
  against trunk revision 903483.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/console

This message is automatically generated.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828708#action_12828708
 ] 

Hadoop QA commented on ZOOKEEPER-569:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434553/zookeeper-569.patch
  against trunk revision 903483.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/61/console

This message is automatically generated.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828764#action_12828764
 ] 

Patrick Hunt commented on ZOOKEEPER-569:


Mockito looks good. If someone wants to include as a testing option please 
enter a JIRA/patch. :-)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828823#action_12828823
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-569:
--

Henry, I was taking a look at the patch, and I'm slightly confused about how it 
goes, so I was wondering if you could give me a hand on understanding it. 

It seems to me that in the situation you describe, heardFrom won't be empty, so 
the checking for heardFrom == 0 wouldn't work. Instead, I think you have to 
call countVotes and check if there is any vote left after it returns, no?

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828860#action_12828860
 ] 

Hadoop QA commented on ZOOKEEPER-569:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434555/zookeeper-569.patch
  against trunk revision 903483.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/console

This message is automatically generated.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-01-13 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799669#action_12799669
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-569:
--

One way to implement the test is to implement a mock server to force the 
particular message interleaving that triggers the bug. No claim it is the best 
way, but it seemed to be a good idea for FLELostMessageTest.

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson

 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.