Hi Cameron, The last point of the FAQ might clarify why the ephemerals are not getting deleted when the cluster is coming back up:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ -Flavio > -----Original Message----- > From: Cameron McKenzie [mailto:[email protected]] > Sent: 08 May 2014 11:42 > To: [email protected] > Subject: Re: Ephemeral node bound to a session that times out while ZK has > no quorum > > After a few more trials, unfortunately it seems completely random as to how > long the ephemeral nodes are sticking around. Sometime's it's minutes, > sometime's they're cleaned up in a matter of seconds after startup... > > > On Thu, May 8, 2014 at 8:31 PM, Cameron McKenzie > <[email protected]>wrote: > > > Sorry, bashed send prematurely! > > > > Guys, > > I've noticed a weird problem with ephemeral nodes not being cleaned up > > if the session they are tied to times out while ZooKeeper does not > > have a quorum. The situation is basically as follows: > > > > 3 node cluster > > -Client connects to cluster and creates an ephemeral node -Two nodes > > die, so quorum is lost -Some time passes (longer than the session > > timeout negotiated for the client that created the ephemeral node) > > -One (or both) of the dead nodes come back and a quorum is reformed. > > -The ephemeral node tied to the session which should have timed out > > still exists and never seems to get cleaned up. > > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems > > to think that the session is still active and associated with the > > ephemeral node in question. > > -It seems to stay in this state for some extended period of time (20+ > > minutes). Interestingly, when I happened to fire up zkCli.sh I could > > see that the node was still there, but after I exited, the node seemed > > to disappear shortly afterwards. So, I wonder if the session > > established by zkCli.sh ending somehow triggered the cleanup of this rogue > ephemeral node? > > > > Has anyone experience this issue before? I understand that it's a bit > > of an edge case, but I'm running across it quite frequently when > > testing changing the size of ZK cluster. > > > > I've thought of a few work arounds for the issue, but I'd like to know > > if it's a known issue. > > > > Any help appreciated! > > cheers > > > > > > > > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie > <[email protected]>wrote: > > > >> Guys, > >> I've noticed a weird problem with ephemeral nodes not being cleaned > >> up if the session they are tied to times out while ZooKeeper does not > >> have a quorum. The situation is basically as follows: > >> > >> 3 node cluster > >> -Client connects to cluster and creates an ephemeral node -Two nodes > >> die, so quorum is lost -Some time passes (longer than the session > >> timeout negotiated for the client that created the ephemeral node) > >> -One (or both) of the dead nodes come back and a quorum is reformed. > >> -The ephemeral node tied to the session which should have timed out > >> still exists > >> > >> > >
