Re: ephemeral node after server bounce

Patrick Hunt Thu, 04 Feb 2010 14:48:18 -0800


Yonik Seeley wrote:

We have solr nodes create ephemeral znodes (name based on host and port).
The ephemeral znode takes some time to remove of course, so what
happens is that if I bounce a solr server (containing a zk client) the
ephemeral node will still exist when the server comes back up.  Since
it exists, the ephemeral won't be re-created, but it does disappear
later.

Yes. This is expected behavior - and one of the main reasons for havinglow session timeouts. Unfortunately the alternative of having lowertimeout can also have negative effect (sensitivity to network/resourceissues, in particular GC as you know)

What's the best way to handle this situation?  Delete and re-create?
Watch it and re-create when it does disappear?
There's no way to "hand over" responsibility for an ephemeral znode, right?


There is one way, not really "hand over" but rather "take back".

What I mean is, when you establish a session you get a session id AND apassword. The session is valid until either it is closed or expires. If,when your client (solr server) comes back up, it is able to provide thesession id and password to the ZK service it will _recover_ the session.This probably means that you would have to write the sessionid/passwordto disk (if your client is crashing/killed for example) and handle somecorner cases (like session expired). It's not a common case as you haveto store the id persistently, but it is possible and might address yourissue (assuming the primary issue is that your client goes down andbecomes active again quickly).

A second alternative is to actively delete the znode in question.However this has similar issues to the prior paragraph - the clientthat's coming up would have to know the session id is had previously, sothat it could get a Stat on the znode, compare the "ephemeralOwner" anddelete if it was the previous owner.

However short of knowing the session id I don't see how you get aroundthis issue other than just waiting for the timeout. The problem is howwould you know to delete the ephemeral node in question? How is anyother solr client going to know that the client died, and thereforeremove the znode... Really that's what you are relying on zk for, viathe timeout.

Yes, you could set a watch on the znode in question, when it is deleted(via the eventual session timeout) you would recreate.

Another option is that the client that's being restarted, it could writeit's IP address somewhere (as part of the znode or part of the data say,or another znode that names ref to this znode, etc...). If the clientwas bounced it could compare it's IP to the ip "assigned" to the znode,if it was the same you would have confidence you were the owner (thisone depends on the usecase but based on what you described it shouldwork unless the client changes IPs dynamically).


Patrick

Re: ephemeral node after server bounce

Reply via email to