Hi! I'm using Curator's Leader Election recipe (2.4.2) and found a very hard-to-reproduce issue which could lead to a situation where both clients become leader.
Let's say 2 clients are competing for leadership, client #1 is currently the leader and zookeeper maintains the following structure under the leaderPath: /leaderPath |- _c_a8524f0b-3bd7-4df3-ae19-cef11159a7a6-lock-0000000240 (client #1) |- _c_b5bdc75f-d2c9-4432-9d58-1f7fe699e125-lock-0000000241 (client #2) autoRequeue flag is set to true for both clients Let's tigger a leader election by restarting the ZooKeeper leader. When this happens, both clients will lose the connection to the ZooKeeper ensemble and will try to re-acquire the LeaderSelector's mutex. Eventually (after the negotiated session timeout) the ephemeral zNodes under /leaderPath will be deleted. The problem occurs when ephemeral zNode deletions interleave with mutex acquisition. Client #1 can observe that both zNodes (240 and 241) are already deleted, /leaderPath has no children so it acquires the mutex successfully. On the other hand, client #2 can observe that both zNodes still exist, so it starts to watch zNode #240 (LockInternals.internalLockLoop():315). In a short period of time the watcher will be notified about the zNode's deletion, so client #2 reenters LockInternals.internalLockLoop(). What is really strange that getSortedChildren() call in LockInternals:284 can still return zNode #241 so it will succeed in acquiring the mutex (LockInternals:287) The result is two clients, both leader, but /leaderPath contains only one zNode for client #1 Did you encounter similar problems before? Do you have any ideas on how to prevent such race conditions? I can think of a solution: The leader should watch its zNode under /leaderPath and interrupt leadership when the zNode gets deleted. Thank you, Tibor
