Ted, Once I modified the code to not respond to disconnects like they were session expirations my issue is resolved. But it did bring up a new question. The original reason the code was there was to handle the case where a client is mainly used for listening to remote events. So once it starts, it sets up a few watches and really doesn't interact with the server after that. The thought was that if such a client was disconnected and did not handle that case, we'd never know about it and it would seem like no remote events occurred. I have since changed this code to loop trying to check existence of some znode upon receipt of a disconnect. If a session expiration occurs in this loop then I trigger the reconnect logic. Otherwise once we reconnect, the check will succeed and the loop will exit. Does this sound like a reasonable way to handle the issue?
Thanks, Martin > > Ted, > > Sorry to trouble you on this one. I do understand the difference, but at > some point I did not. :) > > Your question inspired me to look deeper at our code (to see if we were > confused) and I found one case that was triggering our reconnect response > from Disconnected event. Everywhere else we only do this in response to a > SessionExpiredException. > > Thanks for the quick response and your work on ZooKeeper in general! I > have also run into the "can't create ephemeral yet case" and our code > generally loops until successful. > > -Martin > > -----Original Message----- > > From: Ted Dunning [mailto:[email protected]] > > > > Martin, > > > > From your email, it sounds like there might be a bit of confusion > > between disconnection and session expiration. Are you sure you are > > clear on the difference between these? > > > > Also, I have seen cases in my own code where I confused myself by > > trying to re-create ephemeral files after a client program crashed. I > > knew that the client had crashed as soon as it happened, but the > > Zookeeper servers could only determine this after a bit of time. My > > new program tried to recreate the ephemerals to indicate that it was > > back but since the old ephemerals were still there, that failed. Then > > a short time later when the ZK cluster understood that the old client > > was gone, the ephemerals disappeared even though the new client was > > humming along nicely. My solution was to delete the ephemerals when > creating them. > > > > Is it possible you have a similar confusion? > > > > On Tue, Sep 13, 2011 at 11:25 AM, Martin Serrano <[email protected]> > > wrote: > > > > > Hi, > > > > > > We have added code to our application to reconnect and re-establish > > > watches when we receive a Disconnected event. I am running tests on > > > a heavily loaded system where the zookeeper server and clients are > > > all impacted. On this test system we regularly experience session > > > timeouts and appropriately react to reconnect and set up our watches. > > > There is an uncommon case that I am having trouble puzzling out. > > > When running one of our tests in a loop about 1% of the time we hit > > > a case where > > on the client side we think the > > > session has expired but on the server side it has been renewed. We will > > > then fail to be able to create an ephemeral node because it already > > > exists and does not ever get cleaned up (since the previous session > > > is still valid). I'm trying to figure out if we are misusing the API or > > > if we > have > > > encountered a bug. I'm happy to provide more details. One thing I am > > > wondering is if it is inappropriate to create a new session within > > > the event thread of another session which has received the > > > disconnected > > event. > > > > > > Thanks, > > > Martin Serrano > > > ...
