In the many years of Curators’ existence no one that I know has had an issue 
with this. ZooKeeper is very robust and nodes do not get deleted abnormally 
like this. You are posing a hypothetical situation. It’s not reasonable to 
handle every single edge case. This would be the equivalent of someone going 
into the production database and arbitrarily deleting records. The locking code 
is already incredibly complicated and I wouldn’t want to burden it with this 
new behavior and overhead. However, if you can make it work reasonably please 
provide a PR and the committers will look at it.

-Jordan



On January 20, 2015 at 12:38:36 PM, Michael Peterson ([email protected]) wrote:

> But manually deleting the lock node is not normal behavior.
> It should never happen in production.

I agree that it would be abnormal.  But abnormal doesn't mean impossible.

> Can you explain the scenario in more detail?

There may be a bug in ZK (now or in the future) that in some rare cases deletes 
a file when it should not.

Or a team might in the practice of managing their ZK ensemble via the ZK CLI 
and someone might accidentally type:
"delete /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000"

rather than

"get /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000". 

Or even worse, type
"rmr /XXX/masterlock". 

(I've seen a somewhat similar manual mistake done on HDFS of a production 
Hadoop system where months of data was deleted using up-arrow too fast and 
issuing a -rmr instead of -ls cmd.)

For a system where I need to be absolutely sure that I and only I have the 
lock, this abnormal "backdoor" deletion possibility worries me.  To build a 
truly robust system, you have to handle all the possibilities you can.

The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced earlier 
seems to be arguing the same thing.


On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <[email protected]> 
wrote:
But manually deleting the lock node is not normal behavior. It should never 
happen in production. Can you explain the scenario in more detail? 

-JZ

Reply via email to