The patch is simple, but a test is harder because it's a race condition. I'm working on it.
Henry On Wed, Nov 11, 2009 at 12:06 AM, Flavio Junqueira <f...@yahoo-inc.com>wrote: > Henry already opened one: ZOOKEEPER-569. > > -Flavio > > > On Nov 11, 2009, at 7:03 AM, Patrick Hunt wrote: > > Closing the loop - what's the status on this? Can one of you open a >> JIRA and provide a patch for this? >> >> Thanks, >> >> Patrick >> >> Flavio Junqueira wrote: >> >>> Hi Henry, Apologies for the the delay. Your observation sounds right to >>> me. Here is how I'm reading it; let me know if it makes sense. >>> >>> If everyone votes for 3 in the second round and 3 has crashed, then in >>> countVotes we will remove all votes to 3 and there will be no vote left. >>> In such a case, there will be no winner as a result of the call to >>> countVotes and lookForLeader won't change the current vote >>> (LeaderElection.java:201). This is a situation in which we are stuck. >>> >>> Does it sound reasonable to add an "else" to the "if" statement of >>> LeaderElection.java:201 to reset the vote? This modification would >>> implementing resetting the vote when countVotes returns no winner, which >>> should happen only when the replica itself votes for a dead leader. >>> >>> -Flavio >>> >>> On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote: >>> >>> [ Sending this direct since the Apache mailserver is rejecting my >>>> e-mails at the moment ] >>>> >>>> As I understand it, 1 and 2 receive a vote for 3 in the first round, >>>> which causes them to vote for 3 in the second round. So in the second >>>> round, all votes cast are for 3. But 3 has died, so all votes for it >>>> are discounted. 1 and 2 continue to vote for 3 ad infinitum, never >>>> resetting their vote. >>>> >>>> Does this sound plausible, or am I missing something? >>>> >>>> cheers, >>>> Henry >>>> >>>> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <f...@yahoo-inc.com> >>>> wrote: >>>> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in >>>> your situation. If they exclude 3 in countVotes, then countVotes will >>>> end up returning 2 and not 3, assuming there is a vote for 2. What am >>>> I missing? >>>> >>>> The problem with QuorumPeer you're pointing at was also an issue with >>>> the FLE tests, and I couldn't see an easy way around it other than >>>> timing out and restarting leader election. >>>> >>>> Cheers, >>>> -Flavio >>>> >>>> >>>> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote: >>>> >>>> I've been working on adding a TCPResponderThread to the leader election >>>> process so that if a deployment needs to be TCP only, it can be and >>>> still >>>> use all election types. Testing this has exposed what might be a race >>>> condition in the leader election code that prevents a leader from being >>>> elected. >>>> >>>> Here's the behaviour I see in LETest occasionally. With three nodes >>>> (reduced >>>> from 30 for ease of debugging), node 3 gets elected before either node >>>> 1 or >>>> node 2 finish their election (there is one round where each node that >>>> 3 has >>>> the highest id, and then 3 completes its second round by receiving >>>> votes for >>>> itself from 1 and 2, but 1 and 2 do not receive votes from 3). >>>> >>>> Now 3 is killed by the test harness. 1 and 2 are still voting for it, >>>> but >>>> every time they try, the vote tally excludes 3 since it hasn't been >>>> heard >>>> from. They then spin round the voting process, unable to reset their >>>> vote. I >>>> expect that the heartbeat mechanism in a running QuorumPeer takes care >>>> of >>>> this when the leader is lost, but the associated QuorumPeers aren't >>>> running. >>>> >>>> If this is the case, then there is a simple fix to reset the nodes >>>> vote to >>>> themselves if they are voting for a node that hasn't been heard from. I >>>> don't know why using TCP instead of UDP for the responder thread is >>>> exacerbating this (and we can't rule out my introducing a bug :)); but >>>> as >>>> it's a race condition the different timings associated with waiting on >>>> a TCP >>>> socket might just be enough to expose the issue. >>>> >>>> Can someone verify this might be possible / figure out what I missed? >>>> >>>> cheers, >>>> Henry >>>> >>>> >>>> >>> >>> >