Ben described the general outline of the protocol we implemented, which is an improvement on the recipe to avoid a herd effect every time that the leader changed. This improvement was actually suggested by Runping Qi of Yahoo!. The recipe protocol requires all clients to recomputed if they are now the leader, when the current leader either relinquishes leadership or disconnects. The improved protocol guarantees that only one client will need to recompute.
Here's the algorithm: A persistent ZNode is used to be the parent of one or more ephemeral ZNodes. These ephemeral ZNodes represent the bids of different clients to become the leader. When a client wants to bid to become the leader, it creates an ephemeral sequence node and records the sequence number. Then, to compute if it is the leader, the client scans backwards from the sequence number it was assigned till 0, to find any preceding bids. If a preceding bid is found, the client places a watch on that ZNode, so it is informed when that ZNode is deleted. The deletion represents the owner client relinquishing leadership or disconnecting. When the watch event is received by a client, it scans backwards from its assigned sequence number to 0, to find a preceding bid. If none is found, then this client is now the leader. If a preceding bid is found, the client places a new watch on the ZNode, and waits again. Note that this protocol handles the situation when the current leader disconnects or abdicates, as well as the situation where a preceding-bid but non-leader client disconnects. In both cases, only one client gets a watch notification, so no herd effect is observed. Please ask if you need more details. This protocol will be part of the client library I'm implementing -- however do not get your hopes up too high, because at this time I do not know whether the library will be released outside of Yahoo!. --Jacob -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benjamin Reed Sent: Tuesday, June 17, 2008 10:27 AM To: zookeeper-user@lists.sourceforge.net Subject: Re: [Zookeeper-user] Leader Election Good point. The recipe we show guarantees there will be a single leader elected, but only the leader knows it. Jacob Levy has been implementing a client library to do leader election, so he should really chime in here, but just in case he doesn't: I believe Jacob's solution was for the leader to create an ephemeral znode called LEADER with its id as the data when it becomes the leader, and then delete the node before relinquishing leadership. The other nodes then watch for the existence of the LEADER znode to see leadership changes. ben On Tuesday 17 June 2008 09:28:39 Avinash Lakshman wrote: > Hi All > > I am trying to write a simple leader election module and I have 5 nodes A, > B, C, D and E amongst which I need to elect a leader. Now I am following > the example using SEQUENCE flags and trying to use the technique where the > herd effect can be done away with. So I have A create a znode L-1, B create > znode L-2 .... and E create znode L-5. After this I have L-2 watch L-1, L-3 > watch L-2 etc. Let us assume A was elected leader. When A dies B should > automatically become the leader and this seems to be working. What I need > to know is how to C, D and E know about this? Do I need another mechanism > to disseminate this information? I ask because not all znodes are being > watched i.e C, D and E are not watching for L-1 which is the znode created > by A. So how will they learn as to who the new leader is since no watch > event will be triggered at their end. > > Thanks in advance > Avinash ------------------------------------------------------------------------ - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user