Suppose you set an exists() watch on a node, e.g. in Groovy:

def latch = new CountDownLatch(1)
def stat = zooKeeper.exists("$lockParentNode/$toWatch", [process: { event
-> latch.countDown(); log.debug("fired latch on event $event") }, toString:
{""}] as Watcher)
if (stat != null) {
    // Okay, we've set watch: wait for an event and try again
    log.debug("Set watch on less than me '$toWatch': blocking until an
event occurs which may let us acquire")
    latch.await()
} else {
    // Dang!  Person immediately less than us is gone, try again
    // This is moderately weird unless they were the only ones
    // less than us and so might have owned the lock and just
    // released it
    log.debug("Node '$toWatch' gone when setting watch: trying again to
acquire")
}

Suppose that exists() does return null.  It appears to be the case that the
watch is still registered (both from the evidence below plus a cursory
examination of the ZooKeeper.java client code).  In my
case "$lockParentNode/$toWatch" is ultimately a sequential ephemeral node
that will never ever occur again (part of yet another implementation of a
ZooKeeper lock).  Thus, I believe this watch will remain until the session
that created it is removed, which for us could be months.  Basically we're
leaking a Closure and associated CountDownLatch for each time the node to
be watched is deleted in the interval between when we initially look for it
and when exists() returns null.  I only noticed it when playing with "wchc"
as part of trying to understand a lost watch.

0x233a3c1db310006
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004876
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004234
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004684
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004588
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003118
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003772
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005206
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000001876
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004924
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002020
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005170
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000006526
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002260
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002920
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004414
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005848
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005278
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005752
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005380
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004360
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004624
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002728
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000001846
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004264
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000006142
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004660
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005956
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004810
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002428
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003274
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003370
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002398
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003712
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003652
        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005314

Does this seem like a correct understanding to those with a deeper
understanding of ZooKeeper internals, and does it seem like a problem worth
rectifying?

-- 
Robert Crocombe

Reply via email to