The patch is simple, but a test is harder because it's a race condition. I'm
working on it.

Henry

On Wed, Nov 11, 2009 at 12:06 AM, Flavio Junqueira <f...@yahoo-inc.com>wrote:

> Henry already opened one: ZOOKEEPER-569.
>
> -Flavio
>
>
> On Nov 11, 2009, at 7:03 AM, Patrick Hunt wrote:
>
>  Closing the loop - what's the status on this? Can one of you open a
>> JIRA and provide a patch for this?
>>
>> Thanks,
>>
>> Patrick
>>
>> Flavio Junqueira wrote:
>>
>>> Hi Henry, Apologies for the the delay. Your observation sounds right to
>>> me. Here is how I'm reading it; let me know if it makes sense.
>>>
>>> If everyone votes for 3 in the second round and 3 has crashed, then in
>>> countVotes we will remove all votes to 3 and there will be no vote left.
>>> In such a case, there will be no winner as a result of the call to
>>> countVotes and lookForLeader won't change the current vote
>>> (LeaderElection.java:201). This is a situation in which we are stuck.
>>>
>>> Does it sound reasonable to add an "else" to the "if" statement of
>>> LeaderElection.java:201 to reset the vote? This modification would
>>> implementing resetting the vote when countVotes returns no winner, which
>>> should happen only when the replica itself votes for a dead leader.
>>>
>>> -Flavio
>>>
>>> On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:
>>>
>>>  [ Sending this direct since the Apache mailserver is rejecting my
>>>> e-mails at the moment ]
>>>>
>>>> As I understand it, 1 and 2 receive a vote for 3 in the first round,
>>>> which causes them to vote for 3 in the second round. So in the second
>>>> round, all votes cast are for 3. But 3 has died, so all votes for it
>>>> are discounted. 1 and 2 continue to vote for 3 ad infinitum, never
>>>> resetting their vote.
>>>>
>>>> Does this sound plausible, or am I missing something?
>>>>
>>>> cheers,
>>>> Henry
>>>>
>>>> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <f...@yahoo-inc.com>
>>>> wrote:
>>>> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in
>>>> your situation. If they exclude 3 in countVotes, then countVotes will
>>>> end up returning 2 and not 3, assuming there is a vote for 2. What am
>>>> I missing?
>>>>
>>>> The problem with QuorumPeer you're pointing at was also an issue with
>>>> the FLE tests, and I couldn't see an easy way around it other than
>>>> timing out and restarting leader election.
>>>>
>>>> Cheers,
>>>> -Flavio
>>>>
>>>>
>>>> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
>>>>
>>>> I've been working on adding a TCPResponderThread to the leader election
>>>> process so that if a deployment needs to be TCP only, it can be and
>>>> still
>>>> use all election types. Testing this has exposed what might be a race
>>>> condition in the leader election code that prevents a leader from being
>>>> elected.
>>>>
>>>> Here's the behaviour I see in LETest occasionally. With three nodes
>>>> (reduced
>>>> from 30 for ease of debugging), node 3 gets elected before either node
>>>> 1 or
>>>> node 2 finish their election (there is one round where each node that
>>>> 3 has
>>>> the highest id, and then 3 completes its second round by receiving
>>>> votes for
>>>> itself from 1 and 2, but 1 and 2 do not receive votes from 3).
>>>>
>>>> Now 3 is killed by the test harness. 1 and 2 are still voting for it,
>>>> but
>>>> every time they try, the vote tally excludes 3 since it hasn't been
>>>> heard
>>>> from. They then spin round the voting process, unable to reset their
>>>> vote. I
>>>> expect that the heartbeat mechanism in a running QuorumPeer takes care
>>>> of
>>>> this when the leader is lost, but the associated QuorumPeers aren't
>>>> running.
>>>>
>>>> If this is the case, then there is a simple fix to reset the nodes
>>>> vote to
>>>> themselves if they are voting for a node that hasn't been heard from. I
>>>> don't know why using TCP instead of UDP for the responder thread is
>>>> exacerbating this (and we can't rule out my introducing a bug :)); but
>>>> as
>>>> it's a race condition the different timings associated with waiting on
>>>> a TCP
>>>> socket might just be enough to expose the issue.
>>>>
>>>> Can someone verify this might be possible / figure out what I missed?
>>>>
>>>> cheers,
>>>> Henry
>>>>
>>>>
>>>>
>>>
>>>
>

Reply via email to