I am wondering if I am heading down a bad path. We have implemented distributed locking in zookeeper by hand. A lock is acquired by creating a znode; it is released by deleting the znode. Simply right? That part works great.
Here's where the complexity comes in: We also needed be able to detect if a lock was "broken" -- i.e. the znode was deleted by some third party. For this we use an "exists" watcher that is installed via an asynchronous call immediately after the zone is created. Of course, the "exists" handler is invoked whenever the znode is deleted whether by a third party or by the client itself. We keep state in the client to remember what znodes were previously locked. An issue was discovered in which a client locks, unlocks, and then re-locks the same data. It is possible for the "exists" callback to be delayed and not get delivered until the data is locked the second time. This leads to what we call a "leaked lock", since the znode is created in zookeeper but the client will not unlock it, since it thinks the znode was already deleted. I'm working on a fix for this issue also. It just seems to be getting more complex and risky. I am wondering if I am going astray. Are the watches reliable enough to _guarantee_ that I will receive 1 callback for each delete event? Even if a session fails-over to another node? John -- View this message in context: http://zookeeper-user.578899.n2.nabble.com/Guarantees-of-an-exists-watcher-tp7581088.html Sent from the zookeeper-user mailing list archive at Nabble.com.
