Sometimes leader election ends up in two leaders

stibi Wed, 14 May 2014 12:27:07 -0700

Hi!

I'm using Curator's Leader Election recipe (2.4.2) and found a very
hard-to-reproduce issue which could lead to a situation where both clients
become leader.


Let's say 2 clients are competing for leadership, client #1 is currently
the leader and zookeeper maintains the following structure under the
leaderPath:

/leaderPath
  |- _c_a8524f0b-3bd7-4df3-ae19-cef11159a7a6-lock-0000000240 (client #1)
  |- _c_b5bdc75f-d2c9-4432-9d58-1f7fe699e125-lock-0000000241 (client #2)

autoRequeue flag is set to true for both clients

Let's tigger a leader election by restarting the ZooKeeper leader.

When this happens, both clients will lose the connection to the ZooKeeper
ensemble and will try to re-acquire the LeaderSelector's mutex. Eventually
(after the negotiated session timeout) the ephemeral zNodes under
/leaderPath will be deleted.

The problem occurs when ephemeral zNode deletions interleave with mutex
acquisition.

Client #1 can observe that both zNodes (240 and 241) are already deleted,
/leaderPath has no children so it acquires the mutex successfully.

On the other hand, client #2 can observe that both zNodes still exist, so
it starts to watch zNode #240 (LockInternals.internalLockLoop():315). In a
short period of time the watcher will be notified about the zNode's
deletion, so client #2 reenters LockInternals.internalLockLoop().

What is really strange that getSortedChildren() call in LockInternals:284
can still return zNode #241
so it will succeed in acquiring the mutex (LockInternals:287)

The result is two clients, both leader, but /leaderPath contains only one
zNode for client #1

Did you encounter similar problems before? Do you have any ideas on how to
prevent such race conditions? I can think of a solution: The leader should
watch its zNode under /leaderPath and interrupt leadership when the zNode
gets deleted.

Thank you,
Tibor

Sometimes leader election ends up in two leaders

Reply via email to