Re: Possible race in LETest.java

Flavio Junqueira Tue, 27 Oct 2009 15:51:15 -0700

Hi Henry, I don't understand how 1 and 2 do not end up electing 2 inyour situation. If they exclude 3 in countVotes, then countVotes willend up returning 2 and not 3, assuming there is a vote for 2. What amI missing?

The problem with QuorumPeer you're pointing at was also an issue withthe FLE tests, and I couldn't see an easy way around it other thantiming out and restarting leader election.


Cheers,
-Flavio

On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:

I've been working on adding a TCPResponderThread to the leaderelectionprocess so that if a deployment needs to be TCP only, it can be andstill
use all election types. Testing this has exposed what might be a race
condition in the leader election code that prevents a leader frombeing
elected.
Here's the behaviour I see in LETest occasionally. With three nodes(reducedfrom 30 for ease of debugging), node 3 gets elected before eithernode 1 ornode 2 finish their election (there is one round where each nodethat 3 hasthe highest id, and then 3 completes its second round by receivingvotes for
itself from 1 and 2, but 1 and 2 do not receive votes from 3).
Now 3 is killed by the test harness. 1 and 2 are still voting forit, butevery time they try, the vote tally excludes 3 since it hasn't beenheardfrom. They then spin round the voting process, unable to reset theirvote. Iexpect that the heartbeat mechanism in a running QuorumPeer takescare ofthis when the leader is lost, but the associated QuorumPeers aren'trunning.
If this is the case, then there is a simple fix to reset the nodesvote tothemselves if they are voting for a node that hasn't been heardfrom. I
don't know why using TCP instead of UDP for the responder thread is
exacerbating this (and we can't rule out my introducing a bug :));but asit's a race condition the different timings associated with waitingon a TCP
socket might just be enough to expose the issue.

Can someone verify this might be possible / figure out what I missed?

cheers,
Henry

Re: Possible race in LETest.java

Reply via email to