[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-569:
---

Status: Patch Available  (was: Open)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, ZOOKEEPER-569.patch, 
 zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-11 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Status: Patch Available  (was: Open)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Attachment: zookeeper-569.patch

Here's a patch with tests that appears to fix the issue (test fails without 
fix, test succeeds with). All tests pass for me with this patch on my laptop. 

I have replaced one kludge with another here. QuorumPeer.electionAlg is set to 
null when electionType==0 until the election is actually run. This causes 
problems if you want to retrieve the electionAlg object via getElectionAlg() 
beforehand for tests. 

I've set it up so that makeLEStrategy always creates a new LeaderElection if 
electionType == 0, but also that createElectionAlgorithm sets electionAlg=new 
LeaderElection(this) instead of null, so that as long as startLeaderElection 
has been called, getElectionAlg() won't return null.

I've checked to see if this will cause any obvious problems for the call sites 
of getElectionAlg and couldn't find anything that expected null. It seems more 
consistent to me this way. The question I have is over why LeaderElection needs 
re-instantiating each time when FLE does not.

If this sounds confusing, it's because the code really is! The interaction of 
createElectionAlgorithm, startLeaderElection and makeLEStrategy is hard to 
discern. 

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Status: Patch Available  (was: Open)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-03 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Status: Open  (was: Patch Available)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch, 
 zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Attachment: zookeeper-569.patch

Here's a functional patch for this issue. Looking into making a mock test now - 
would like to use Mockito or similar, but will probably have to use hand-rolled 
mocks for now. 



 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Status: Patch Available  (was: Open)

Running this by Hudson to verify the build, despite not including new tests. 

(It will probably +1 if tests pass because I have slightly edited LETest)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-569:
-

Attachment: zookeeper-569.patch

Forgot --no-prefix from git. Hopefully this will apply. 

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election

2010-02-02 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-569:
---

Fix Version/s: 3.3.0
   Status: Patch Available  (was: Open)

 Failure of elected leader can lead to never-ending leader election
 --

 Key: ZOOKEEPER-569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
 Project: Zookeeper
  Issue Type: Bug
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: zookeeper-569.patch, zookeeper-569.patch


 It is possible for basic LeaderElection to enter a situation where it never 
 terminates. 
 As an example, consider a three node cluster A, B and C.
 1. In the first round, A votes for A, B votes for B and C votes for C
 2. Since C  B  A, all nodes resolve to vote for C in the second round as 
 there is no first round winner
 3. A, B vote for C, but C fails.
 4. C is not elected because neither A nor B hear from it, and so votes for it 
 are discarded
 5. A and B never reset their votes, despite not hearing from C, so continue 
 to vote for it ad infinitum. 
 Step 5 is the bug. If A and B reset their votes to themselves in the case 
 where the heard-from vote set is empty, leader election will continue.
 I do not know if this affects running ZK clusters, as it is possible that the 
 out-of-band failure detection protocols may cause leader election to be 
 restarted anyhow, but I've certainly seen this in tests. 
 I have a trivial patch which fixes it, but it needs a test (and tests for 
 race conditions are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.