I'm using a timeout of 5000ms. Now let me ask this: Suppose all of my clients are waiting on some external event -- not ZooKeeper -- so they are all idle and are not touching ZK nodes, nor are they calling exists, getChildren, etc etc. Can that idleness cause session expiry?
I'm running a local quorum of 3 nodes. That is, I have an Ant script that kicks off 3 <java> tasks in parallel to run ConsumerPeerMain, each with its own config file. Regarding handling of the failure, I suspect I will just have to reinitialize by creating a new instance of my client(s) that themselves will have a new ZK instance. I'm using Spring to wire everything together, which is why it's particularly difficult to simply re-create a new ZK instance and pass it to the classes using it (those classes have no knowledge of each other). But I _can_ just pull a freshly-created (prototype) instance from the Spring application context, which is where a new ZK client will be wired in. The only ramification there is I have to throw the KeeperException as a fatal exception rather than letting that client try to re-elect. Or maybe add in some logic to say "if I can't re-elect, _then_ throw an exception and consider it fatal." Thanks guys. -Tom On Thu, Feb 12, 2009 at 2:39 PM, Patrick Hunt <ph...@apache.org> wrote: > Regardless of frequency Tom's code still has to handle this situation. > > I would suggest that the "two classes" Tom is referring to in his mail, the > ones that use ZK client object, should either be able to "reinitialize" with > a new zk session, or they themselves should be discarded and new instances > created using the new session (not sure what makes more sense for his > archi...) > > Regardless of whether we reuse the session object or create a new one I > believe the code using the session needs to "reinitialize" in some way -- > there's been a dramatic break from the cluster. > > As I mentioned, you can decrease the likelihood of expiration by increasing > the timeout - but the downside is that you are less sensitive to clients > dying (because their ephemeral nodes don't get deleted till close/expire and > if you are doing something like leader election among your clients it will > take longer for the followers to be notified). > > Patrick > > Mahadev Konar wrote: >> >> Hi Tom, >> The session expired event means that the the server expired the client >> and >> that means the watches and ephemrals will go away for that node. >> >> How are you running your zookeeper quorum? Session expiry event should be >> really rare event . If you have a quorum of servers it should rarely >> happen. >> >> mahadev >> >> >> On 2/12/09 11:17 AM, "Tom Nichols" <tmnich...@gmail.com> wrote: >> >>> So if a session expires, my ephemeral nodes and watches have already >>> disappeared? I suppose creating a new ZK instance with the old >>> session ID would not do me any good in that case. Correct? >>> >>> Thanks. >>> -Tom >>> >>> >>> >>> On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar <maha...@yahoo-inc.com> >>> wrote: >>>> >>>> Hi Tom, >>>> We prefer to discard the zookeeper instance if a session expires. >>>> Maintaining a one to one relationship between a client handle and a >>>> session >>>> makes it much simpler for users to understand the existence and >>>> disappearance of ephemeral nodes and watches created by a zookeeper >>>> client. >>>> >>>> thanks >>>> mahadev >>>> >>>> >>>> On 2/12/09 10:58 AM, "Tom Nichols" <tmnich...@gmail.com> wrote: >>>> >>>>> I've come across the situation where a ZK instance will have an >>>>> expired connection and therefore all operations fail. Now AFAIK the >>>>> only way to recover is to create a new ZK instance with the old >>>>> session ID, correct? >>>>> >>>>> Now, my problem is, the ZK instance may be shared -- not between >>>>> threads -- but maybe two classes in the same thread synchronize on >>>>> different nodes by using different watchers. So it makes sense that >>>>> one ZK client instance can handle this. Except that even if I detect >>>>> the session expiration by catching the KeeperException, if I want to >>>>> "resume" the session, I have to create a new ZK instance and pass it >>>>> to any classes who were previously sharing the same instance. Does >>>>> this make sense so far? >>>>> >>>>> Anyway, bottom line is, it would be nice if a ZK instance could itself >>>>> recover a session rather than discarding that instance and creating a >>>>> new one. >>>>> >>>>> Thoughts? >>>>> >>>>> Thanks in advance, >>>>> >>>>> -Tom >>>> >> >