[ Sending this direct since the Apache mailserver is rejecting my
e-mails at the moment ]
As I understand it, 1 and 2 receive a vote for 3 in the first round,
which causes them to vote for 3 in the second round. So in the
second
round, all votes cast are for 3. But 3 has died, so all votes for it
are discounted. 1 and 2 continue to vote for 3 ad infinitum, never
resetting their vote.
Does this sound plausible, or am I missing something?
cheers,
Henry
On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <f...@yahoo-
inc.com>
wrote:
Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in
your situation. If they exclude 3 in countVotes, then countVotes
will
end up returning 2 and not 3, assuming there is a vote for 2. What
am
I missing?
The problem with QuorumPeer you're pointing at was also an issue
with
the FLE tests, and I couldn't see an easy way around it other than
timing out and restarting leader election.
Cheers,
-Flavio
On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
I've been working on adding a TCPResponderThread to the leader
election
process so that if a deployment needs to be TCP only, it can be
and still
use all election types. Testing this has exposed what might be a
race
condition in the leader election code that prevents a leader from
being
elected.
Here's the behaviour I see in LETest occasionally. With three nodes
(reduced
from 30 for ease of debugging), node 3 gets elected before either
node
1 or
node 2 finish their election (there is one round where each node
that
3 has
the highest id, and then 3 completes its second round by receiving
votes for
itself from 1 and 2, but 1 and 2 do not receive votes from 3).
Now 3 is killed by the test harness. 1 and 2 are still voting for
it, but
every time they try, the vote tally excludes 3 since it hasn't
been heard
from. They then spin round the voting process, unable to reset their
vote. I
expect that the heartbeat mechanism in a running QuorumPeer takes
care of
this when the leader is lost, but the associated QuorumPeers aren't
running.
If this is the case, then there is a simple fix to reset the nodes
vote to
themselves if they are voting for a node that hasn't been heard
from. I
don't know why using TCP instead of UDP for the responder thread is
exacerbating this (and we can't rule out my introducing a bug :));
but as
it's a race condition the different timings associated with
waiting on
a TCP
socket might just be enough to expose the issue.
Can someone verify this might be possible / figure out what I
missed?
cheers,
Henry