Sorry, bashed send prematurely! Guys, I've noticed a weird problem with ephemeral nodes not being cleaned up if the session they are tied to times out while ZooKeeper does not have a quorum. The situation is basically as follows:
3 node cluster -Client connects to cluster and creates an ephemeral node -Two nodes die, so quorum is lost -Some time passes (longer than the session timeout negotiated for the client that created the ephemeral node) -One (or both) of the dead nodes come back and a quorum is reformed. -The ephemeral node tied to the session which should have timed out still exists and never seems to get cleaned up. -If I telnet in on port 2181 and 'dump', then I can see that ZK seems to think that the session is still active and associated with the ephemeral node in question. -It seems to stay in this state for some extended period of time (20+ minutes). Interestingly, when I happened to fire up zkCli.sh I could see that the node was still there, but after I exited, the node seemed to disappear shortly afterwards. So, I wonder if the session established by zkCli.sh ending somehow triggered the cleanup of this rogue ephemeral node? Has anyone experience this issue before? I understand that it's a bit of an edge case, but I'm running across it quite frequently when testing changing the size of ZK cluster. I've thought of a few work arounds for the issue, but I'd like to know if it's a known issue. Any help appreciated! cheers On Thu, May 8, 2014 at 8:15 PM, Cameron McKenzie <[email protected]>wrote: > Guys, > I've noticed a weird problem with ephemeral nodes not being cleaned up if > the session they are tied to times out while ZooKeeper does not have a > quorum. The situation is basically as follows: > > 3 node cluster > -Client connects to cluster and creates an ephemeral node > -Two nodes die, so quorum is lost > -Some time passes (longer than the session timeout negotiated for the > client that created the ephemeral node) > -One (or both) of the dead nodes come back and a quorum is reformed. > -The ephemeral node tied to the session which should have timed out still > exists > >
