> But manually deleting the lock node is not normal behavior. > It should never happen in production.
I agree that it would be abnormal. But abnormal doesn't mean impossible. > Can you explain the scenario in more detail? There may be a bug in ZK (now or in the future) that in some rare cases deletes a file when it should not. Or a team might in the practice of managing their ZK ensemble via the ZK CLI and someone might accidentally type: "delete /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000" rather than "get /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000". Or even worse, type "rmr /XXX/masterlock". (I've seen a somewhat similar manual mistake done on HDFS of a production Hadoop system where months of data was deleted using up-arrow too fast and issuing a -rmr instead of -ls cmd.) For a system where I need to be absolutely sure that I and only I have the lock, this abnormal "backdoor" deletion possibility worries me. To build a truly robust system, you have to handle all the possibilities you can. The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced earlier seems to be arguing the same thing. On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <jordan@jordanzimmerman .com> wrote: > But manually deleting the lock node is not normal behavior. It should > never happen in production. Can you explain the scenario in more detail? > > -JZ > >
