hey Michi, I'll have to double check the logs to see if the client got a session expired event, but I would presume so because the ephemeral nodes lying around had a different session ID. I guess it's a possibility that the old connection stayed open, and a new one was also created, but I don't believe this to be the case. cheers
On Thu, May 15, 2014 at 12:41 PM, Michi Mutsuzaki <[email protected]>wrote: > Hi Cameron, > > Did the client get the session expired event? Sessions don't expire > during quorum loss, and I'm guessing the session got revalidated when > the cluster reformed a quorum. > > > On Thu, May 8, 2014 at 3:31 AM, Cameron McKenzie <[email protected]> > wrote: > > Sorry, bashed send prematurely! > > > > Guys, > > I've noticed a weird problem with ephemeral nodes not being cleaned up if > > the session they are tied to times out while ZooKeeper does not have a > > quorum. The situation is basically as follows: > > > > 3 node cluster > > -Client connects to cluster and creates an ephemeral node > > -Two nodes die, so quorum is lost > > -Some time passes (longer than the session timeout negotiated for the > > client that created the ephemeral node) > > -One (or both) of the dead nodes come back and a quorum is reformed. > > -The ephemeral node tied to the session which should have timed out still > > exists and never seems to get cleaned up. > > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems to > > think that the session is still active and associated with the ephemeral > > node in question. > > -It seems to stay in this state for some extended period of time (20+ > > minutes). Interestingly, when I happened to fire up zkCli.sh I could see > > that the node was still there, but after I exited, the node seemed to > > disappear shortly afterwards. So, I wonder if the session established by > > zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral > node? > > > > Has anyone experience this issue before? I understand that it's a bit of > an > > edge case, but I'm running across it quite frequently when testing > changing > > the size of ZK cluster. > > > > I've thought of a few work arounds for the issue, but I'd like to know if > > it's a known issue. > > > > Any help appreciated! > > cheers > > > > > > > > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <[email protected] > >wrote: > > > >> Guys, > >> I've noticed a weird problem with ephemeral nodes not being cleaned up > if > >> the session they are tied to times out while ZooKeeper does not have a > >> quorum. The situation is basically as follows: > >> > >> 3 node cluster > >> -Client connects to cluster and creates an ephemeral node > >> -Two nodes die, so quorum is lost > >> -Some time passes (longer than the session timeout negotiated for the > >> client that created the ephemeral node) > >> -One (or both) of the dead nodes come back and a quorum is reformed. > >> -The ephemeral node tied to the session which should have timed out > still > >> exists > >> > >> >
