Re: FastLeaderElection

2009-04-14 Thread rag...@yahoo.com

Flavio,

Thanks for the detailed answer. Makes sense.

-Raghu



- Original Message 
From: Flavio Junqueira 
To: "rag...@yahoo.com" 
Cc: zookeeper-dev@hadoop.apache.org
Sent: Tuesday, 14 April, 2009 9:21:35
Subject: Re: FastLeaderElection

Hi Raghu, I'm glad that you going deep into the code, so thanks for all your 
questions.

In your description below, you say that a faulty peer gets back. If it gets 
back, then it is not faulty, right? To be more concrete, suppose that this peer 
was partitioned away, and it now ends up being elected because it has a zxid 
higher than everyone else. I would say that this behavior is correct.

The behavior we are trying to avoid is different. Suppose that the current 
leader CL crashes and a new leader NL arises. Suppose also that NL crashes 
before followers are able to connect to NL. In this case, followers will move 
on to another round of leader election. If there is a slow process that didn't 
finish the election of NL, but has NL as its current candidate, then it will 
propose it again and without the notion of rounds, server will accept the 
notification from the slow process.

Thanks,
-Flavio

On Apr 14, 2009, at 5:10 PM, rag...@yahoo.com wrote:

> 
> Falvio,
> 
> Thanks for explaining this.
> 
> When the faulty peer gets back and attempts to propose itself as the leader, 
> it's clear that all the other peers don't consider its proposal and notify 
> the faulty peer that they are in a higher epoch. However, the faulty peer 
> will sync up its logical clock upon receiving the first notification from a 
> higher epoch and resend a proposal notification to all with itself as the 
> proposed leader (because it's zxid is higher). If the other peers haven't 
> completed the election loop by the time the updated notificaiton is received 
> from the faulty peer, they will succumb again, update their proposal record 
> and send notifications to all others with faulty peer as the proposed leader.
> 
> So the logical clock only seems to be buying some time here, rather than 
> completely eliminating the faulty peer. The code seems to be hoping that the 
> rest of the peers will complete their election loop and start following a new 
> leader by the time the faulty peer syncs up its logical clock and notifies 
> other peers. Is my understanding correct?
> 
> -Raghu
> 
> 
> 
> - Original Message 
> From: Flavio Junqueira 
> To: zookeeper-dev@hadoop.apache.org
> Cc: rag...@yahoo.com
> Sent: Monday, 13 April, 2009 15:08:10
> Subject: Re: FastLeaderElection
> 
> Hi Raghu, Upon multiple consecutive crashes (or perhaps a network partition), 
> it is possible that we keep electing a faulty server if we only use zxid. We 
> avoid such a problem using a logical clock as servers only consider changing 
> their proposals if they received a notification from the same or a later 
> epoch. With this mechanism, if an elected server crashes before exercising 
> its role as a leader, it won't be considered in later epochs. Without a 
> logical clock, a server lagging behind in the election could re-introduce the 
> faulty server into the election, and it would be elected again if the faulty 
> server is the one with highest zxid.
> 
> Note that we are not using "logical clocks" in the sense of Lamport clocks. 
> We are not incrementing upon every event, but instead only counting rounds of 
> leader election.
> 
> -Flavio
> 
> On Apr 13, 2009, at 8:55 PM, rag...@yahoo.com wrote:
> 
>> 
>> Could someone please throw some light on this? Thanks.
>> 
>> -Raghu
>> 
>> 
>> 
>> - Original Message 
>> From: "rag...@yahoo.com" 
>> To: zookeeper-u...@hadoop.apache.org
>> Sent: Friday, 10 April, 2009 8:11:34
>> Subject: FastLeaderElection
>> 
>> 
>> Hi,
>> 
>> Could someone please explain quickly why logical clock is used in 
>> FastLeaderElection? It looks to me like the peers can converge on a leader 
>> (with highest zxid or server id if zxids are the same) even without the 
>> logical clock. May be I am missing something here, I could not figure out 
>> why logical clock is needed.
>> 
>> Thanks
>> Raghu
>> 
>> 
>> 
> 
> 
> 





Re: FastLeaderElection

2009-04-14 Thread Flavio Junqueira
Hi Raghu, I'm glad that you going deep into the code, so thanks for  
all your questions.


In your description below, you say that a faulty peer gets back. If it  
gets back, then it is not faulty, right? To be more concrete, suppose  
that this peer was partitioned away, and it now ends up being elected  
because it has a zxid higher than everyone else. I would say that this  
behavior is correct.


The behavior we are trying to avoid is different. Suppose that the  
current leader CL crashes and a new leader NL arises. Suppose also  
that NL crashes before followers are able to connect to NL. In this  
case, followers will move on to another round of leader election. If  
there is a slow process that didn't finish the election of NL, but has  
NL as its current candidate, then it will propose it again and without  
the notion of rounds, server will accept the notification from the  
slow process.


Thanks,
-Flavio

On Apr 14, 2009, at 5:10 PM, rag...@yahoo.com wrote:



Falvio,

Thanks for explaining this.

When the faulty peer gets back and attempts to propose itself as the  
leader, it's clear that all the other peers don't consider its  
proposal and notify the faulty peer that they are in a higher epoch.  
However, the faulty peer will sync up its logical clock upon  
receiving the first notification from a higher epoch and resend a  
proposal notification to all with itself as the proposed leader  
(because it's zxid is higher). If the other peers haven't completed  
the election loop by the time the updated notificaiton is received  
from the faulty peer, they will succumb again, update their proposal  
record and send notifications to all others with faulty peer as the  
proposed leader.


So the logical clock only seems to be buying some time here, rather  
than completely eliminating the faulty peer. The code seems to be  
hoping that the rest of the peers will complete their election loop  
and start following a new leader by the time the faulty peer syncs  
up its logical clock and notifies other peers. Is my understanding  
correct?


-Raghu



- Original Message 
From: Flavio Junqueira 
To: zookeeper-dev@hadoop.apache.org
Cc: rag...@yahoo.com
Sent: Monday, 13 April, 2009 15:08:10
Subject: Re: FastLeaderElection

Hi Raghu, Upon multiple consecutive crashes (or perhaps a network  
partition), it is possible that we keep electing a faulty server if  
we only use zxid. We avoid such a problem using a logical clock as  
servers only consider changing their proposals if they received a  
notification from the same or a later epoch. With this mechanism, if  
an elected server crashes before exercising its role as a leader, it  
won't be considered in later epochs. Without a logical clock, a  
server lagging behind in the election could re-introduce the faulty  
server into the election, and it would be elected again if the  
faulty server is the one with highest zxid.


Note that we are not using "logical clocks" in the sense of Lamport  
clocks. We are not incrementing upon every event, but instead only  
counting rounds of leader election.


-Flavio

On Apr 13, 2009, at 8:55 PM, rag...@yahoo.com wrote:



Could someone please throw some light on this? Thanks.

-Raghu



- Original Message 
From: "rag...@yahoo.com" 
To: zookeeper-u...@hadoop.apache.org
Sent: Friday, 10 April, 2009 8:11:34
Subject: FastLeaderElection


Hi,

Could someone please explain quickly why logical clock is used in  
FastLeaderElection? It looks to me like the peers can converge on a  
leader (with highest zxid or server id if zxids are the same) even  
without the logical clock. May be I am missing something here, I  
could not figure out why logical clock is needed.


Thanks
Raghu











Re: FastLeaderElection

2009-04-14 Thread rag...@yahoo.com

Falvio,

Thanks for explaining this.

When the faulty peer gets back and attempts to propose itself as the leader, 
it's clear that all the other peers don't consider its proposal and notify the 
faulty peer that they are in a higher epoch. However, the faulty peer will sync 
up its logical clock upon receiving the first notification from a higher epoch 
and resend a proposal notification to all with itself as the proposed leader 
(because it's zxid is higher). If the other peers haven't completed the 
election loop by the time the updated notificaiton is received from the faulty 
peer, they will succumb again, update their proposal record and send 
notifications to all others with faulty peer as the proposed leader.

So the logical clock only seems to be buying some time here, rather than 
completely eliminating the faulty peer. The code seems to be hoping that the 
rest of the peers will complete their election loop and start following a new 
leader by the time the faulty peer syncs up its logical clock and notifies 
other peers. Is my understanding correct? 

-Raghu



- Original Message 
From: Flavio Junqueira 
To: zookeeper-dev@hadoop.apache.org
Cc: rag...@yahoo.com
Sent: Monday, 13 April, 2009 15:08:10
Subject: Re: FastLeaderElection

Hi Raghu, Upon multiple consecutive crashes (or perhaps a network partition), 
it is possible that we keep electing a faulty server if we only use zxid. We 
avoid such a problem using a logical clock as servers only consider changing 
their proposals if they received a notification from the same or a later epoch. 
With this mechanism, if an elected server crashes before exercising its role as 
a leader, it won't be considered in later epochs. Without a logical clock, a 
server lagging behind in the election could re-introduce the faulty server into 
the election, and it would be elected again if the faulty server is the one 
with highest zxid.

Note that we are not using "logical clocks" in the sense of Lamport clocks. We 
are not incrementing upon every event, but instead only counting rounds of 
leader election.

-Flavio

On Apr 13, 2009, at 8:55 PM, rag...@yahoo.com wrote:

> 
> Could someone please throw some light on this? Thanks.
> 
> -Raghu
> 
> 
> 
> - Original Message 
> From: "rag...@yahoo.com" 
> To: zookeeper-u...@hadoop.apache.org
> Sent: Friday, 10 April, 2009 8:11:34
> Subject: FastLeaderElection
> 
> 
> Hi,
> 
> Could someone please explain quickly why logical clock is used in 
> FastLeaderElection? It looks to me like the peers can converge on a leader 
> (with highest zxid or server id if zxids are the same) even without the 
> logical clock. May be I am missing something here, I could not figure out why 
> logical clock is needed.
> 
> Thanks
> Raghu
> 
> 
> 





Re: FastLeaderElection

2009-04-13 Thread Flavio Junqueira
Hi Raghu, Upon multiple consecutive crashes (or perhaps a network  
partition), it is possible that we keep electing a faulty server if we  
only use zxid. We avoid such a problem using a logical clock as  
servers only consider changing their proposals if they received a  
notification from the same or a later epoch. With this mechanism, if  
an elected server crashes before exercising its role as a leader, it  
won't be considered in later epochs. Without a logical clock, a server  
lagging behind in the election could re-introduce the faulty server  
into the election, and it would be elected again if the faulty server  
is the one with highest zxid.


Note that we are not using "logical clocks" in the sense of Lamport  
clocks. We are not incrementing upon every event, but instead only  
counting rounds of leader election.


-Flavio

On Apr 13, 2009, at 8:55 PM, rag...@yahoo.com wrote:



Could someone please throw some light on this? Thanks.

-Raghu



- Original Message 
From: "rag...@yahoo.com" 
To: zookeeper-u...@hadoop.apache.org
Sent: Friday, 10 April, 2009 8:11:34
Subject: FastLeaderElection


Hi,

Could someone please explain quickly why logical clock is used in  
FastLeaderElection? It looks to me like the peers can converge on a  
leader (with highest zxid or server id if zxids are the same) even  
without the logical clock. May be I am missing something here, I  
could not figure out why logical clock is needed.


Thanks
Raghu