Falvio, Thanks for explaining this.
When the faulty peer gets back and attempts to propose itself as the leader, it's clear that all the other peers don't consider its proposal and notify the faulty peer that they are in a higher epoch. However, the faulty peer will sync up its logical clock upon receiving the first notification from a higher epoch and resend a proposal notification to all with itself as the proposed leader (because it's zxid is higher). If the other peers haven't completed the election loop by the time the updated notificaiton is received from the faulty peer, they will succumb again, update their proposal record and send notifications to all others with faulty peer as the proposed leader. So the logical clock only seems to be buying some time here, rather than completely eliminating the faulty peer. The code seems to be hoping that the rest of the peers will complete their election loop and start following a new leader by the time the faulty peer syncs up its logical clock and notifies other peers. Is my understanding correct? -Raghu ----- Original Message ---- From: Flavio Junqueira <f...@yahoo-inc.com> To: firstname.lastname@example.org Cc: rag...@yahoo.com Sent: Monday, 13 April, 2009 15:08:10 Subject: Re: FastLeaderElection Hi Raghu, Upon multiple consecutive crashes (or perhaps a network partition), it is possible that we keep electing a faulty server if we only use zxid. We avoid such a problem using a logical clock as servers only consider changing their proposals if they received a notification from the same or a later epoch. With this mechanism, if an elected server crashes before exercising its role as a leader, it won't be considered in later epochs. Without a logical clock, a server lagging behind in the election could re-introduce the faulty server into the election, and it would be elected again if the faulty server is the one with highest zxid. Note that we are not using "logical clocks" in the sense of Lamport clocks. We are not incrementing upon every event, but instead only counting rounds of leader election. -Flavio On Apr 13, 2009, at 8:55 PM, rag...@yahoo.com wrote: > > Could someone please throw some light on this? Thanks. > > -Raghu > > > > ----- Original Message ---- > From: "rag...@yahoo.com" <rag...@yahoo.com> > To: zookeeper-u...@hadoop.apache.org > Sent: Friday, 10 April, 2009 8:11:34 > Subject: FastLeaderElection > > > Hi, > > Could someone please explain quickly why logical clock is used in > FastLeaderElection? It looks to me like the peers can converge on a leader > (with highest zxid or server id if zxids are the same) even without the > logical clock. May be I am missing something here, I could not figure out why > logical clock is needed. > > Thanks > Raghu > > >