Ben described the general outline of the protocol we implemented, which
is an improvement on the recipe to avoid a herd effect every time that
the leader changed. This improvement was actually suggested by Runping
Qi of Yahoo!. The recipe protocol requires all clients to recomputed if
they are now the leader, when the current leader either relinquishes
leadership or disconnects. The improved protocol guarantees that only
one client will need to recompute.

Here's the algorithm:

A persistent ZNode is used to be the parent of one or more ephemeral
ZNodes. These ephemeral ZNodes represent the bids of different clients
to become the leader.

When a client wants to bid to become the leader, it creates an ephemeral
sequence node and records the sequence number. Then, to compute if it is
the leader, the client scans backwards from the sequence number it was
assigned till 0, to find any preceding bids. If a preceding bid is
found, the client places a watch on that ZNode, so it is informed when
that ZNode is deleted. The deletion represents the owner client
relinquishing leadership or disconnecting.

When the watch event is received by a client, it scans backwards from
its assigned sequence number to 0, to find a preceding bid. If none is
found, then this client is now the leader. If a preceding bid is found,
the client places a new watch on the ZNode, and waits again.

Note that this protocol handles the situation when the current leader
disconnects or abdicates, as well as the situation where a preceding-bid
but non-leader client disconnects. In both cases, only one client gets a
watch notification, so no herd effect is observed.

Please ask if you need more details. This protocol will be part of the
client library I'm implementing -- however do not get your hopes up too
high, because at this time I do not know whether the library will be
released outside of Yahoo!.


-----Original Message-----
[mailto:[EMAIL PROTECTED] On Behalf Of
Benjamin Reed
Sent: Tuesday, June 17, 2008 10:27 AM
Subject: Re: [Zookeeper-user] Leader Election

Good point. The recipe we show guarantees there will be a single leader 
elected, but only the leader knows it. Jacob Levy has been implementing
client library to do leader election, so he should really chime in here,
just in case he doesn't: I believe Jacob's solution was for the leader
create an ephemeral znode called LEADER with its id as the data when it 
becomes the leader, and then delete the node before relinquishing
The other nodes then watch for the existence of the LEADER znode to see 
leadership changes.


On Tuesday 17 June 2008 09:28:39 Avinash Lakshman wrote:
> Hi All
> I am trying to write a simple leader election module and I have 5
nodes A,
> B, C, D and E amongst which I need to elect a leader. Now I am
> the example using SEQUENCE flags and trying to use the technique where
> herd effect can be done away with. So I have A create a znode L-1, B
> znode L-2 .... and E create znode L-5. After this I have L-2 watch
L-1, L-3
> watch L-2 etc. Let us assume A was elected leader. When A dies B
> automatically become the leader and this seems to be working. What I
> to know is how to C, D and E know about this? Do I need another
> to disseminate this information? I ask because not all znodes are
> watched i.e C, D and E are not watching for L-1 which is the znode
> by A. So how will they learn as to who the new leader is since no
> event will be triggered at their end.
> Thanks in advance
> Avinash

Check out the new Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
Zookeeper-user mailing list

Check out the new Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
Zookeeper-user mailing list

Reply via email to