Thanks Flavio,

I'm debugging this and found another issue in the code in findPrefixInChildren()

        private void findPrefixInChildren(String prefix, ZooKeeper zookeeper, 
String dir) 
            throws KeeperException, InterruptedException {
            List<String> names = zookeeper.getChildren(dir, false);
            for (String name : names) {
                if (name.startsWith(prefix)) {
                    id = name;  /*   THIS DOES NOT HAVE THE FULL PATH */
                    if (LOG.isDebugEnabled()) {
                        LOG.debug("Found id created last time: " + id);
                    }
                    break;
                }
            }
            if (id == null) {
                id = zookeeper.create(dir + "/" + prefix, data, 
                        getAcl(), EPHEMERAL_SEQUENTIAL);   /* THIS HAS THE FULL 
PATH */

                if (LOG.isDebugEnabled()) {
                    LOG.debug("Created id: " + id);
                }
            }

        }

If we find the node in the children, we set id to x-$session-$sequence.  If we 
create the znode, id is set to $dir/x-$session-$sequence

I believe the first case should be 
  Id = dir + "/" + name;

-----Original Message-----
From: Flavio Junqueira [mailto:f...@apache.org] 
Sent: Tuesday, February 13, 2018 3:58 PM
To: user@zookeeper.apache.org
Subject: Re: Bug in WriteLock recipe

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

You are right that there is a race between getting the children and checking 
whether the predecessor is there. If we fail to set a watcher, then we won't 
try locking again. I think setting id to null will work because it forces 
another iteration of the do/while loop, which checks whether there is still 
some predecessor, setting a watcher accordingly. The number of iterations 
should be finite because we eventually hit the owned znode.

It might better to add a different condition for clarity, like `while (Id == 
null || watcherNotSet)`. In any case, I'd appreciate if you could chime in and 
contribute your changes to https://issues.apache.org/jira/browse/ZOOKEEPER-645 
<https://issues.apache.org/jira/browse/ZOOKEEPER-645>.

-Flavio

> On 13 Feb 2018, at 22:01, Kathryn Hogg <kathryn.h...@oati.net> wrote:
> 
> Hey Flavio,
> 
> FYI, I'm on 3.4.11 if that makes a difference.
> 
> The problem is
>      1. In order to get to the exists() call, we've already determined that 
> id is not null.
>      2. After stat returns null, we log a message but do not reset stat to 
> null.
>      3.  The while loop will terminate because id is not null
> 
> I think this can fixed by setting id to null if stat returns null. This is 
> similar to what is done when lessThanMe is not Empty.
> 
> Here's an outline of the code I'm talking about (my suggested change is after 
> the LOG.warn()):
> 
> do {
>   if (id == null) {
>       ....
>  }
>  If (id != null) {
>     if (names.IsEmpty()) {
>         ...
>         Id = null;
>     } else {
>         ....
>        If (!lessThanMe.isEmpty())
>                            Stat stat = zookeeper.exists(lastChildId, new 
> LockWatcher());
>                            if (stat != null) {
>                                return Boolean.FALSE;
>                            } else {
>                                LOG.warn("Could not find the" +
>                                               " stats for less than me: " + 
> lastChildName.getName());
>                                 /* id = null; */  // id is not null here so 
> the while loop will terminate 
>                            }
>       } else {
>             if (isOwner()) {
>                      if (callback != null) {
>                           callback.lockAcquired();
>                     }
>                      return Boolean.TRUE;
>            }
>        }
>    }
> }
> }
> while (id == null);
> 
> 
> -----Original Message-----
> From: Flavio Junqueira [mailto:f...@apache.org] 
> Sent: Tuesday, February 13, 2018 2:45 PM
> To: user@zookeeper.apache.org
> Subject: Re: Bug in WriteLock recipe
> 
> {External email message: This email is from an external source. Please 
> exercise caution prior to opening attachments, clicking on links, or 
> providing any sensitive information.}
> 
> Hi Kathryn,
> 
> Every time that execute method is invoked, it will get children. From your 
> description, in the case the predecessor node is deleted and stat is null, 
> the next call will not contain that predecessor znode. Consequently, it won't 
> happen indefinitely. Makes sense?
> 
> There is actually an old issue about WriteLock that you may want to have a 
> look:
> 
> https://issues.apache.org/jira/browse/ZOOKEEPER-645 
> <https://issues.apache.org/jira/browse/ZOOKEEPER-645>
> 
> Thanks,
> -Flavio
> 
> 
>> On 13 Feb 2018, at 18:49, Kathryn Hogg <kathryn.h...@oati.net> wrote:
>> 
>> I'm actually using the WriteLock from the ZookeeperNetEx C# code but I've 
>> verified that the same issue exists in the Java recipe.  On a busy system, 
>> I'm fairly frequently seeing WriteLock that is never granted to client and 
>> gets stuck.
>> 
>> What I believe is happening is the lock sets a watch on the request before 
>> him via this code:
>> 
>>                           Stat stat = zookeeper.exists(lastChildId, new 
>> LockWatcher());
>>                           if (stat != null) {
>>                               return Boolean.FALSE;
>>                           } else {
>>                               LOG.warn("Could not find the" +
>>                                                               " stats for 
>> less than me: " + lastChildName.getName());
>>                           }
>> 
>> The problem (as I see it and I'm still fairly new to Zookeeper) is that if 
>> the node represented by lastChildId has been deleted before the call to 
>> exists is made, stat will return null and the watch will only ever be 
>> invoked when the znode is created.  And of course that will never happen.
>> 
>> The message is appearing in my log and my watcher for the lock is never 
>> invoked.
>> 
>> [2018-02-13 16:49:17.905 GMT    WARNING         WriteLock       Could not 
>> find the stats for less than me: 
>> /token/SegmentProfileQueueToken/x-72057953399865370-0000000724]
>> 
>> I'm not entirely sure of the proper way of fixing this but I think setting
>>   Id = null;
>> When stat is null should work.
>> 
>> Can someone verify if my analysis is correct?
>> 
>> --
>> Kathryn Hogg
>> Senior Manager Product Development
>> Phone: 763.201.2000
>> Fax: 763.201.5333
>> Open Access Technology International, Inc.
>> 3660 Technology Drive NE, Minneapolis, MN 55418
>> 
>> CONFIDENTIAL INFORMATION: This email and any attachment(s) contain 
>> confidential and/or proprietary information of Open Access Technology 
>> International, Inc.  Do not copy or distribute without the prior written 
>> consent of OATI.  If you are not named a recipient to the message, please 
>> notify the sender immediately and do not retain the message in any form, 
>> printed or electronic.
>> 
> 

Reply via email to