At the bottom of the LeaderLatch.checkLeadership() method (in Curator 2.12.0),
the following is called to set a watch on the waiter one spot ahead of us, as
expected:
String watchPath = sortedChildren.get(ourIndex - 1);
...
client.getData().usingWatcher(watcher).inBackground(callback).forPath(ZKPaths.makePath(latchPath,
watchPath));
When the watch fires and has event type NodeDeleted, then
LeaderLatch.getChildren() is called to determine if we are now the leader,
again as expected.
But I don't understand the following. The code for the callback (called in
response to the getData() completing) has this:
if ( event.getResultCode() == KeeperException.Code.NONODE.intValue() )
{
// previous node is gone - reset
reset();
}
It seems that this should be a call to getChildren(), rather than to reset(),
since this means that the waiter ahead of us has disappeared (the same
situation we were watch-ing for, but it happened before we could even set a
watch), whereas reset() would be the right reaction if our path had
disappeared. It looks to me like this code "thinks" it's a watch about our
node, rather than the one-ahead-of-use node. I think this would cause us to
delete and re-create our node, effectively moving us to the back of the wait
queue, rather than making us the leader now.
Is this a bug, or am I misinterpreting something?
Looking at the history of this code, it looks like it did in fact used to check
for leadership at this point, and got changed when the code was updated to use
background calls (in commit sha ff4ec29f).
Thanks in advance,
Jeff Clites