Yeah, this problem is also mentioned in that issue 645. -Flavio
> On 13 Feb 2018, at 23:03, Kathryn Hogg <kathryn.h...@oati.net> wrote: > > Thanks Flavio, > > I'm debugging this and found another issue in the code in > findPrefixInChildren() > > private void findPrefixInChildren(String prefix, ZooKeeper zookeeper, > String dir) > throws KeeperException, InterruptedException { > List<String> names = zookeeper.getChildren(dir, false); > for (String name : names) { > if (name.startsWith(prefix)) { > id = name; /* THIS DOES NOT HAVE THE FULL PATH */ > if (LOG.isDebugEnabled()) { > LOG.debug("Found id created last time: " + id); > } > break; > } > } > if (id == null) { > id = zookeeper.create(dir + "/" + prefix, data, > getAcl(), EPHEMERAL_SEQUENTIAL); /* THIS HAS THE > FULL PATH */ > > if (LOG.isDebugEnabled()) { > LOG.debug("Created id: " + id); > } > } > > } > > If we find the node in the children, we set id to x-$session-$sequence. If > we create the znode, id is set to $dir/x-$session-$sequence > > I believe the first case should be > Id = dir + "/" + name; > > -----Original Message----- > From: Flavio Junqueira [mailto:f...@apache.org] > Sent: Tuesday, February 13, 2018 3:58 PM > To: user@zookeeper.apache.org > Subject: Re: Bug in WriteLock recipe > > {External email message: This email is from an external source. Please > exercise caution prior to opening attachments, clicking on links, or > providing any sensitive information.} > > You are right that there is a race between getting the children and checking > whether the predecessor is there. If we fail to set a watcher, then we won't > try locking again. I think setting id to null will work because it forces > another iteration of the do/while loop, which checks whether there is still > some predecessor, setting a watcher accordingly. The number of iterations > should be finite because we eventually hit the owned znode. > > It might better to add a different condition for clarity, like `while (Id == > null || watcherNotSet)`. In any case, I'd appreciate if you could chime in > and contribute your changes to > https://issues.apache.org/jira/browse/ZOOKEEPER-645 > <https://issues.apache.org/jira/browse/ZOOKEEPER-645>. > > -Flavio > >> On 13 Feb 2018, at 22:01, Kathryn Hogg <kathryn.h...@oati.net> wrote: >> >> Hey Flavio, >> >> FYI, I'm on 3.4.11 if that makes a difference. >> >> The problem is >> 1. In order to get to the exists() call, we've already determined that >> id is not null. >> 2. After stat returns null, we log a message but do not reset stat to >> null. >> 3. The while loop will terminate because id is not null >> >> I think this can fixed by setting id to null if stat returns null. This is >> similar to what is done when lessThanMe is not Empty. >> >> Here's an outline of the code I'm talking about (my suggested change is >> after the LOG.warn()): >> >> do { >> if (id == null) { >> .... >> } >> If (id != null) { >> if (names.IsEmpty()) { >> ... >> Id = null; >> } else { >> .... >> If (!lessThanMe.isEmpty()) >> Stat stat = zookeeper.exists(lastChildId, new >> LockWatcher()); >> if (stat != null) { >> return Boolean.FALSE; >> } else { >> LOG.warn("Could not find the" + >> " stats for less than me: " + >> lastChildName.getName()); >> /* id = null; */ // id is not null here so >> the while loop will terminate >> } >> } else { >> if (isOwner()) { >> if (callback != null) { >> callback.lockAcquired(); >> } >> return Boolean.TRUE; >> } >> } >> } >> } >> } >> while (id == null); >> >> >> -----Original Message----- >> From: Flavio Junqueira [mailto:f...@apache.org] >> Sent: Tuesday, February 13, 2018 2:45 PM >> To: user@zookeeper.apache.org >> Subject: Re: Bug in WriteLock recipe >> >> {External email message: This email is from an external source. Please >> exercise caution prior to opening attachments, clicking on links, or >> providing any sensitive information.} >> >> Hi Kathryn, >> >> Every time that execute method is invoked, it will get children. From your >> description, in the case the predecessor node is deleted and stat is null, >> the next call will not contain that predecessor znode. Consequently, it >> won't happen indefinitely. Makes sense? >> >> There is actually an old issue about WriteLock that you may want to have a >> look: >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-645 >> <https://issues.apache.org/jira/browse/ZOOKEEPER-645> >> >> Thanks, >> -Flavio >> >> >>> On 13 Feb 2018, at 18:49, Kathryn Hogg <kathryn.h...@oati.net> wrote: >>> >>> I'm actually using the WriteLock from the ZookeeperNetEx C# code but I've >>> verified that the same issue exists in the Java recipe. On a busy system, >>> I'm fairly frequently seeing WriteLock that is never granted to client and >>> gets stuck. >>> >>> What I believe is happening is the lock sets a watch on the request before >>> him via this code: >>> >>> Stat stat = zookeeper.exists(lastChildId, new >>> LockWatcher()); >>> if (stat != null) { >>> return Boolean.FALSE; >>> } else { >>> LOG.warn("Could not find the" + >>> " stats for >>> less than me: " + lastChildName.getName()); >>> } >>> >>> The problem (as I see it and I'm still fairly new to Zookeeper) is that if >>> the node represented by lastChildId has been deleted before the call to >>> exists is made, stat will return null and the watch will only ever be >>> invoked when the znode is created. And of course that will never happen. >>> >>> The message is appearing in my log and my watcher for the lock is never >>> invoked. >>> >>> [2018-02-13 16:49:17.905 GMT WARNING WriteLock Could not >>> find the stats for less than me: >>> /token/SegmentProfileQueueToken/x-72057953399865370-0000000724] >>> >>> I'm not entirely sure of the proper way of fixing this but I think setting >>> Id = null; >>> When stat is null should work. >>> >>> Can someone verify if my analysis is correct? >>> >>> -- >>> Kathryn Hogg >>> Senior Manager Product Development >>> Phone: 763.201.2000 >>> Fax: 763.201.5333 >>> Open Access Technology International, Inc. >>> 3660 Technology Drive NE, Minneapolis, MN 55418 >>> >>> CONFIDENTIAL INFORMATION: This email and any attachment(s) contain >>> confidential and/or proprietary information of Open Access Technology >>> International, Inc. Do not copy or distribute without the prior written >>> consent of OATI. If you are not named a recipient to the message, please >>> notify the sender immediately and do not retain the message in any form, >>> printed or electronic. >>> >> >