Hi Cameron,

The last point of the FAQ might clarify why the ephemerals are not getting 
deleted when the cluster is coming back up: 

https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ

-Flavio

> -----Original Message-----
> From: Cameron McKenzie [mailto:[email protected]]
> Sent: 08 May 2014 11:42
> To: [email protected]
> Subject: Re: Ephemeral node bound to a session that times out while ZK has
> no quorum
> 
> After a few more trials, unfortunately it seems completely random as to how
> long the ephemeral nodes are sticking around. Sometime's it's minutes,
> sometime's they're cleaned up in a matter of seconds after startup...
> 
> 
> On Thu, May 8, 2014 at 8:31 PM, Cameron McKenzie
> <[email protected]>wrote:
> 
> > Sorry, bashed send prematurely!
> >
> > Guys,
> > I've noticed a weird problem with ephemeral nodes not being cleaned up
> > if the session they are tied to times out while ZooKeeper does not
> > have a quorum. The situation is basically as follows:
> >
> > 3 node cluster
> > -Client connects to cluster and creates an ephemeral node -Two nodes
> > die, so quorum is lost -Some time passes (longer than the session
> > timeout negotiated for the client that created the ephemeral node)
> > -One (or both) of the dead nodes come back and a quorum is reformed.
> > -The ephemeral node tied to the session which should have timed out
> > still exists and never seems to get cleaned up.
> > -If I telnet in on port 2181 and 'dump', then I can see that ZK seems
> > to think that the session is still active and associated with the
> > ephemeral node in question.
> > -It seems to stay in this state for some extended period of time (20+
> > minutes). Interestingly, when I happened to fire up zkCli.sh I could
> > see that the node was still there, but after I exited, the node seemed
> > to disappear shortly afterwards. So, I wonder if the session
> > established by zkCli.sh ending somehow triggered the cleanup of this rogue
> ephemeral node?
> >
> > Has anyone experience this issue before? I understand that it's a bit
> > of an edge case, but I'm running across it quite frequently when
> > testing changing the size of ZK cluster.
> >
> > I've thought of a few work arounds for the issue, but I'd like to know
> > if it's a known issue.
> >
> > Any help appreciated!
> > cheers
> >
> >
> >
> > On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie
> <[email protected]>wrote:
> >
> >> Guys,
> >> I've noticed a weird problem with ephemeral nodes not being cleaned
> >> up if the session they are tied to times out while ZooKeeper does not
> >> have a quorum. The situation is basically as follows:
> >>
> >> 3 node cluster
> >> -Client connects to cluster and creates an ephemeral node -Two nodes
> >> die, so quorum is lost -Some time passes (longer than the session
> >> timeout negotiated for the client that created the ephemeral node)
> >> -One (or both) of the dead nodes come back and a quorum is reformed.
> >> -The ephemeral node tied to the session which should have timed out
> >> still exists
> >>
> >>
> >

Reply via email to