What guarantees that zNode 241 will be deleted prior to the (successful) 
attempt of client #2 to reacquire the mutex using zNode 241?
Because that’s how the lock works. As long as 241 exists, no other client will 
consider itself as having the mutex. 

reacquire the mutex using zNode 241?
This is not what happens. The client will try to acquire using a _different_ 
znode. Are you thinking that 241 is re-used? It’s not. 

-JZ


From: stibi [email protected]
Reply: stibi [email protected]
Date: May 22, 2014 at 7:26:57 AM
To: Jordan Zimmerman [email protected], [email protected] 
[email protected]
Subject:  Re: Sometimes leader election ends up in two leaders  

Hi!

Thanks for the quick response.
About this step:

— Time N + D2 — 
The ZooKeeper quorum is repaired and the nodes start a doWork() loop again. At 
this point, there can be 2, 3 or 4 nodes depending. 
lock-0000000240 (waiting to be deleted)
lock-0000000241 (waiting to be deleted)
lock-0000000242
lock-0000000243
Neither of the instances will achieve leadership until the nodes 240/241 are 
deleted.

What guarantees that zNode 241 will be deleted prior to the (successful) 
attempt of client #2 to reacquire the mutex using zNode 241?
AFAIK node deletion is a background operation and a retry policy controls how 
often a deletion attempt will occur (even for guaranteed deletes). Unlucky 
timing can lead to a situation where deletion of zNode 241 happens after the 
mutex acquisition. In this case the mutex is not released by the leader, but 
since the zNodes are deleted, the other client will also be elected as leader.

Thanks,
Tibor



On Thu, May 15, 2014 at 3:37 AM, Jordan Zimmerman <[email protected]> 
wrote:
I don’t think the situation you describe can happen. Let’s walk through this:

— Time N — 
We have a single, correct leader and 2 nodes:
lock-0000000240
lock-0000000241

— Time N + D1 — 
ZooKeeper leader instance is restarted. Shortly thereafter, both Curator 
clients will exit their doWork() loops and mark their nodes for deletion. Due 
to a failed connection, though there are still the 2 nodes:
lock-0000000240 (waiting to be deleted)
lock-0000000241 (waiting to be deleted)

— Time N + D2 — 
The ZooKeeper quorum is repaired and the nodes start a doWork() loop again. At 
this point, there can be 2, 3 or 4 nodes depending. 
lock-0000000240 (waiting to be deleted)
lock-0000000241 (waiting to be deleted)
lock-0000000242
lock-0000000243
Neither of the instances will achieve leadership until the nodes 240/241 are 
deleted.

Of course, there may be something else that’s causing you to see 2 leaders. A 
while back I discovered that rolling config changes can do it 
(http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html).
 Or, there’s something else going on in Curator. 

-Jordan


From: stibi [email protected]
Reply: [email protected] [email protected]
Date: May 14, 2014 at 11:39:48 AM
To: [email protected] [email protected]
Subject:  Sometimes leader election ends up in two leaders

Hi!

I'm using Curator's Leader Election recipe (2.4.2) and found a very 
hard-to-reproduce issue which could lead to a situation where both clients 
become leader.

Let's say 2 clients are competing for leadership, client #1 is currently the 
leader and zookeeper maintains the following structure under the leaderPath:

/leaderPath
  |- _c_a8524f0b-3bd7-4df3-ae19-cef11159a7a6-lock-0000000240 (client #1)
  |- _c_b5bdc75f-d2c9-4432-9d58-1f7fe699e125-lock-0000000241 (client #2)

autoRequeue flag is set to true for both clients

Let's tigger a leader election by restarting the ZooKeeper leader.

When this happens, both clients will lose the connection to the ZooKeeper 
ensemble and will try to re-acquire the LeaderSelector's mutex. Eventually 
(after the negotiated session timeout) the ephemeral zNodes under /leaderPath 
will be deleted.

The problem occurs when ephemeral zNode deletions interleave with mutex 
acquisition.
  
Client #1 can observe that both zNodes (240 and 241) are already deleted, 
/leaderPath has no children so it acquires the mutex successfully.

On the other hand, client #2 can observe that both zNodes still exist, so it 
starts to watch zNode #240 (LockInternals.internalLockLoop():315). In a short 
period of time the watcher will be notified about the zNode's deletion, so 
client #2 reenters LockInternals.internalLockLoop().

What is really strange that getSortedChildren() call in LockInternals:284 can 
still return zNode #241
so it will succeed in acquiring the mutex (LockInternals:287)

The result is two clients, both leader, but /leaderPath contains only one zNode 
for client #1

Did you encounter similar problems before? Do you have any ideas on how to 
prevent such race conditions? I can think of a solution: The leader should 
watch its zNode under /leaderPath and interrupt leadership when the zNode gets 
deleted.

Thank you,
Tibor


Reply via email to