Hi Patrick, thanks for your answers. I did some tests yesterday and observed the following behaviors:
1. Session events i.e. Type-None events are sent to all outstanding watch handlers. So if you do get(path, watcherX), both the default listener and watcherX will receive the session events. 2. Watchers are one-time triggers, however session events do NOT remove a watcher. In other words, if we're listening for NodeCreated event and a disconnection occurs, we will eventually get notify of a Disconnected, then a SyncConnected and finally a NodeCreated without having to set any new watcher. 3. If the invocation of a (synchronous or asynchronous) method fails, the watcher is not set. For instance if getChildren("/foo", mywatcher) fails because the client is disconnected, mywatcher won't be notified of futur events. I apologize in advance if I'm stating the obvious but the differences between "path" events and "session" events were not clear to me. <http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_zkWatches> Alexis On Fri, Jun 25, 2010 at 12:36 PM, Patrick Hunt <ph...@apache.org> wrote: > > > On 06/12/2010 10:07 PM, Alexis Midon wrote: > >> I implemented queues and locks on top of ZooKeeper, and I'm pretty happy >> so >> far. Thanks for the nice work. Tests look good. So good that we can focus >> on >> exception/error handling and I got a couple of questions. >> >> #1. Regarding the use of the default watcher. A ZooKeeper instance has a >> default watcher, most operations can also specify a watcher. When both are >> set, does the operation watcher override the default watcher? >> > > if you use the get(path, bool) then the default watcher is notified, if you > use get(path, watcherX) then only "watcherX" is notified. > > > or will both watchers be invoked? if so in which order? Does each watcher >> receive all the types of event? >> > > no, both watchers are not invoked. > > > I had a look at the code, and my understanding is that the default watcher >> will always receive the type-NONE events, even if an "operation" watcher >> is >> set. No guarantee on the order of invocation though. Could you confirm >> and/or complete please? >> >> > The watcher gets both state change notifications and watch events. You can > register multiple watchers for the same path (incl the default), there is no > guarantee on ordering at all. > > > #2 After a connection loss, the client will eventually reconnect to the ZK >> cluster so I guess I can keep using the same client instance. But are >> there >> > > right > > > cases where it is necessary to re-instantiate a ZooKeeper client? As a >> first >> recovery-strategy, is that ok to always recreate a client so that any >> ephemeral node previously owned disappear? >> > > if the session is expired that's the case you need to recreate the session > object (or if you explicitly close). > > Yes, this is a fine strategy if your application domain "fits". If you have > a very expensive "recovery" or "bootstrap" process then recreating the > session on every disconnect would be a bad idea. > > > The case I struggle with is the following: >> Let's say I've acquired a lock (i.e. an ephemeral locknode is created). >> Some application logic failed due to a connection loss. At this stage I'd >> like to give up/roll back. Here I would typically throw an exception, the >> lock being released in a finally. But I can't release the lock since the >> connection is down. Later the client eventually reconnects, the session >> didn't expire so the locknode still exists. Now no one else can acquire >> this >> lock until my session expires. >> > > Yes, you are reading the situation correctly. In this case you either have > to take the easy route - close the session and create a new one (again, if > your app domain supports this) or your client needs to check if the lock is > still being held (it's still the owner) when it's eventually reconnected. > You can verify this for an ephemeral node by looking at the "ephemeralOwner" > field of the Stat object. If this matches your session id then you are the > owner and still hold the lock. This is a bit tricky to get right though, so > in some cases clients just close the session and recreate. > > > >> #3. could you describe the recommended actions for each exception code? >> > > this is highly dependent on your application requirements. See above for my > general information. ff to ask more questions. > > Regards, > > Patrick >