On 06/12/2010 10:07 PM, Alexis Midon wrote:
I implemented queues and locks on top of ZooKeeper, and I'm pretty happy so
far. Thanks for the nice work. Tests look good. So good that we can focus on
exception/error handling and I got a couple of questions.
#1. Regarding the use of the default watcher. A ZooKeeper instance has a
default watcher, most operations can also specify a watcher. When both are
set, does the operation watcher override the default watcher?
if you use the get(path, bool) then the default watcher is notified, if
you use get(path, watcherX) then only "watcherX" is notified.
or will both watchers be invoked? if so in which order? Does each watcher
receive all the types of event?
no, both watchers are not invoked.
I had a look at the code, and my understanding is that the default watcher
will always receive the type-NONE events, even if an "operation" watcher is
set. No guarantee on the order of invocation though. Could you confirm
and/or complete please?
The watcher gets both state change notifications and watch events. You
can register multiple watchers for the same path (incl the default),
there is no guarantee on ordering at all.
#2 After a connection loss, the client will eventually reconnect to the ZK
cluster so I guess I can keep using the same client instance. But are there
cases where it is necessary to re-instantiate a ZooKeeper client? As a first
recovery-strategy, is that ok to always recreate a client so that any
ephemeral node previously owned disappear?
if the session is expired that's the case you need to recreate the
session object (or if you explicitly close).
Yes, this is a fine strategy if your application domain "fits". If you
have a very expensive "recovery" or "bootstrap" process then recreating
the session on every disconnect would be a bad idea.
The case I struggle with is the following:
Let's say I've acquired a lock (i.e. an ephemeral locknode is created).
Some application logic failed due to a connection loss. At this stage I'd
like to give up/roll back. Here I would typically throw an exception, the
lock being released in a finally. But I can't release the lock since the
connection is down. Later the client eventually reconnects, the session
didn't expire so the locknode still exists. Now no one else can acquire this
lock until my session expires.
Yes, you are reading the situation correctly. In this case you either
have to take the easy route - close the session and create a new one
(again, if your app domain supports this) or your client needs to check
if the lock is still being held (it's still the owner) when it's
eventually reconnected. You can verify this for an ephemeral node by
looking at the "ephemeralOwner" field of the Stat object. If this
matches your session id then you are the owner and still hold the lock.
This is a bit tricky to get right though, so in some cases clients just
close the session and recreate.
#3. could you describe the recommended actions for each exception code?
this is highly dependent on your application requirements. See above for
my general information. ff to ask more questions.