Suppose you set an exists() watch on a node, e.g. in Groovy:
def latch = new CountDownLatch(1)
def stat = zooKeeper.exists("$lockParentNode/$toWatch", [process: { event
-> latch.countDown(); log.debug("fired latch on event $event") }, toString:
{""}] as Watcher)
if (stat != null) {
// Okay, we've set watch: wait for an event and try again
log.debug("Set watch on less than me '$toWatch': blocking until an
event occurs which may let us acquire")
latch.await()
} else {
// Dang! Person immediately less than us is gone, try again
// This is moderately weird unless they were the only ones
// less than us and so might have owned the lock and just
// released it
log.debug("Node '$toWatch' gone when setting watch: trying again to
acquire")
}
Suppose that exists() does return null. It appears to be the case that the
watch is still registered (both from the evidence below plus a cursory
examination of the ZooKeeper.java client code). In my
case "$lockParentNode/$toWatch" is ultimately a sequential ephemeral node
that will never ever occur again (part of yet another implementation of a
ZooKeeper lock). Thus, I believe this watch will remain until the session
that created it is removed, which for us could be months. Basically we're
leaking a Closure and associated CountDownLatch for each time the node to
be watched is deleted in the interval between when we initially look for it
and when exists() returns null. I only noticed it when playing with "wchc"
as part of trying to understand a lost watch.
0x233a3c1db310006
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004876
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004234
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004684
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004588
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000003118
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000003772
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005206
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000001876
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004924
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000002020
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005170
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000006526
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000002260
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000002920
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004414
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005848
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005278
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005752
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005380
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004360
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004624
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000002728
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000001846
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004264
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000006142
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004660
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005956
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000004810
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000002428
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000003274
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000003370
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000002398
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000003712
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000003652
/plexus/slaves/grid279/lock/x-233a3c1db310003-0000005314
Does this seem like a correct understanding to those with a deeper
understanding of ZooKeeper internals, and does it seem like a problem worth
rectifying?
--
Robert Crocombe