Re: InterProcessMutex doesn't detect deletion of lock file

Michael Peterson Tue, 20 Jan 2015 09:39:13 -0800

> But manually deleting the lock node is not normal behavior.
> It should never happen in production.

I agree that it would be abnormal.  But abnormal doesn't mean impossible.

> Can you explain the scenario in more detail?

There may be a bug in ZK (now or in the future) that in some rare cases
deletes a file when it should not.

Or a team might in the practice of managing their ZK ensemble via the ZK CLI
and someone might accidentally type:
"delete /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000"

rather than

"get /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000".

Or even worse, type
"rmr /XXX/masterlock".

(I've seen a somewhat similar manual mistake done on HDFS of a production
Hadoop system where months of data was deleted using up-arrow too fast and
issuing a -rmr instead of -ls cmd.)

For a system where I need to be absolutely sure that I and only I have the
lock, this abnormal "backdoor" deletion possibility worries me.  To build a
truly robust system, you have to handle all the possibilities you can.

The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced
earlier seems to be arguing the same thing.

On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <jordan@jordanzimmerman
.com> wrote:

> But manually deleting the lock node is not normal behavior. It should
> never happen in production. Can you explain the scenario in more detail?
>
> -JZ
>
>

Re: InterProcessMutex doesn't detect deletion of lock file

Reply via email to