Re: FastLeaderElection

Flavio Junqueira Tue, 14 Apr 2009 09:22:19 -0700

Hi Raghu, I'm glad that you going deep into the code, so thanks forall your questions.

In your description below, you say that a faulty peer gets back. If itgets back, then it is not faulty, right? To be more concrete, supposethat this peer was partitioned away, and it now ends up being electedbecause it has a zxid higher than everyone else. I would say that thisbehavior is correct.

The behavior we are trying to avoid is different. Suppose that thecurrent leader CL crashes and a new leader NL arises. Suppose alsothat NL crashes before followers are able to connect to NL. In thiscase, followers will move on to another round of leader election. Ifthere is a slow process that didn't finish the election of NL, but hasNL as its current candidate, then it will propose it again and withoutthe notion of rounds, server will accept the notification from theslow process.


Thanks,
-Flavio

On Apr 14, 2009, at 5:10 PM, rag...@yahoo.com wrote:

Falvio,

Thanks for explaining this.
When the faulty peer gets back and attempts to propose itself as theleader, it's clear that all the other peers don't consider itsproposal and notify the faulty peer that they are in a higher epoch.However, the faulty peer will sync up its logical clock uponreceiving the first notification from a higher epoch and resend aproposal notification to all with itself as the proposed leader(because it's zxid is higher). If the other peers haven't completedthe election loop by the time the updated notificaiton is receivedfrom the faulty peer, they will succumb again, update their proposalrecord and send notifications to all others with faulty peer as theproposed leader.
So the logical clock only seems to be buying some time here, ratherthan completely eliminating the faulty peer. The code seems to behoping that the rest of the peers will complete their election loopand start following a new leader by the time the faulty peer syncsup its logical clock and notifies other peers. Is my understandingcorrect?
-Raghu



----- Original Message ----
From: Flavio Junqueira <f...@yahoo-inc.com>
To: zookeeper-dev@hadoop.apache.org
Cc: rag...@yahoo.com
Sent: Monday, 13 April, 2009 15:08:10
Subject: Re: FastLeaderElection
Hi Raghu, Upon multiple consecutive crashes (or perhaps a networkpartition), it is possible that we keep electing a faulty server ifwe only use zxid. We avoid such a problem using a logical clock asservers only consider changing their proposals if they received anotification from the same or a later epoch. With this mechanism, ifan elected server crashes before exercising its role as a leader, itwon't be considered in later epochs. Without a logical clock, aserver lagging behind in the election could re-introduce the faultyserver into the election, and it would be elected again if thefaulty server is the one with highest zxid.
Note that we are not using "logical clocks" in the sense of Lamportclocks. We are not incrementing upon every event, but instead onlycounting rounds of leader election.
-Flavio

On Apr 13, 2009, at 8:55 PM, rag...@yahoo.com wrote:
Could someone please throw some light on this? Thanks.

-Raghu



----- Original Message ----
From: "rag...@yahoo.com" <rag...@yahoo.com>
To: zookeeper-u...@hadoop.apache.org
Sent: Friday, 10 April, 2009 8:11:34
Subject: FastLeaderElection


Hi,
Could someone please explain quickly why logical clock is used inFastLeaderElection? It looks to me like the peers can converge on aleader (with highest zxid or server id if zxids are the same) evenwithout the logical clock. May be I am missing something here, Icould not figure out why logical clock is needed.
Thanks
Raghu

Re: FastLeaderElection

Reply via email to