[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859552#comment-15859552
 ] 

Edward Ribeiro edited comment on ZOOKEEPER-2688 at 2/9/17 2:03 PM:
-------------------------------------------------------------------

Hi [~ror6ax], 

The {{rmr}} calls a utility method to recursively delete a root znode and its 
children. This method operates in two steps: a) first it traverses the tree 
from the root path and stores the absolute paths for root and children in a 
list. b) then it traverses the list, calling ZK delete method for each absolute 
path.

Based on the znode path you posted only, I *guess* that is happening a *race 
condition* as below:

0. Suppose that {{/vault}} has a lot of children znodes, many levels deep (this 
is not really necessary if the actions interleaving is just precise).

1. Also, *suppose*  
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} is an 
ephemeral node (it has the word "lock" in its path, so I think it's been used 
for distributed locking). The session that owns this ephemeral node has been 
closed, then znode will be removed;

2. The tree traversal method I cited above gets 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} and 
adds the path to the list *just before* it has been removed (miliseconds 
before).

3. ZK deletes 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}};

4. the list of paths to be delete is traversed (remember: a long list) and 
*eventually* reaches 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} but 
when it try to delete it there is an exception because it has already been 
deleted by ZK itself;

Does it make sense?

Please, it's just my theory for what is happening.

PS: Even though {{rmr}} is being deprecated in favor of {{deleteAll}}, it also 
uses the same utility class so this behavior could potentially happen with with 
{{deleteAll}}.

*OTOH*, it can indeed be some sort of inconsistency. Did you try to issue a ls 
or get command on 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} 
*after the error* to see if it really exists? Or try to remove it via delete?


was (Author: eribeiro):
Hi [~ror6ax], 

The rmr calls a utility method to recursively delete a root znode and its 
children. This method operates in two steps: a) first it traverses the tree 
from the root path and stores the absolute paths for root and children in a 
list. b) it traverses the list and calling ZK delete method for each path.

Based on the znode path you posted only, I *guess* that is happening a race 
condition as below:

0. Suppose that {{/vault}} is has a lot of children znodes, many levels deep.

1. Also, *suppose*  
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} is an 
ephemeral node (it has the word "lock" in its path, so it's been used for 
distributed locking). The session that owns this ephemeral node has been 
closed, then znode will be removed;

2. The tree traversal method I cited above gets 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} and 
adds the path to the list *just before* it has been removed (miliseconds 
before).

3. ZK deletes 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}};

4. the list of paths to be delete is traversed (remember: a long list) and 
*eventually* reaches 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} but 
when it try to delete it there is an exception because it has already been 
deleted by ZK itself;

Does it make sense?

Please, it's just my theory for what is happening.

PS: Even though {{rmr}} is being deprecated in favor of {{deleteAll}}, it also 
uses the same utility class so this behavior could potentially happen with with 
{{deleteAll}}.

*OTOH*, it can indeed be some sort of inconsistency. Did you try to issue a ls 
or get command on 
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} 
*after the error* to see if it really exists? Or try to remove it via delete?

> rmr leads to "Node does not exist"
> ----------------------------------
>
>                 Key: ZOOKEEPER-2688
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2688
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.9
>            Reporter: Gregory Reshetniak
>
> Issuing rmr /vault leads to Node does not exist: 
> /vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028
> I know that rmr is getting deprecated in next version, but I think this might 
> be cluster consistency bug.
> Please advice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to