ephemeral node after server bounce

2010-02-04 Thread Yonik Seeley
We have solr nodes create ephemeral znodes (name based on host and port).
The ephemeral znode takes some time to remove of course, so what
happens is that if I bounce a solr server (containing a zk client) the
ephemeral node will still exist when the server comes back up.  Since
it exists, the ephemeral won't be re-created, but it does disappear
later.

What's the best way to handle this situation?  Delete and re-create?
Watch it and re-create when it does disappear?
There's no way to hand over responsibility for an ephemeral znode, right?

-Yonik
http://www.lucidimagination.com


Re: ephemeral node after server bounce

2010-02-04 Thread Ted Dunning
On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 There's no way to hand over responsibility for an ephemeral znode, right?


Right.


 We have solr nodes create ephemeral znodes (name based on host and port).
 The ephemeral znode takes some time to remove of course, so what
 happens is that if I bounce a solr server (containing a zk client) the
 ephemeral node will still exist when the server comes back up.


This problem comes up with any system that has hysteresis and needs a single
point of control.


 What's the best way to handle this situation?  Delete and re-create?

Watch it and re-create when it does disappear?


I think you need to handle the problem of multiple search nodes coming up on
the same machine, possibly because the old one may have hung up.

So... I would recommend

a) if the ephemeral still exists, wait for a few more seconds to see if it
disappears (20?)

b) if it goes away, create a new one and continue as normal

c) if it doesn't go away take additional action to determine if service is
still running (i.e. panic and run in circles).


Re: ephemeral node after server bounce

2010-02-04 Thread kishore g
Worst case option would be to have jvm shutdownhooks
http://stackoverflow.com/questions/40376/handle-signals-in-the-java-virtual-machine

You can delete the znodes on exit. More like deleteOnExit functionality of a
File

thanks,
Kishore G



On Thu, Feb 4, 2010 at 2:56 PM, Patrick Hunt ph...@apache.org wrote:

 hah, you guys beat me to the punch. I think having some unique per client
 token might also work (see my resp). Perhaps this is the ip of the host or
 better (esp if multiple clients on a single host) would be some solr
 specific id that uniquely identifies each node.

 Patrick


 Benjamin Reed wrote:

 i second ted's proposals! thanx ted.

 there is one other option. when you create the ZooKeeper object you can
 pass a session id and password. your bounced server can actually reattach to
 the session. (that is why we put that constructor in.) to use it you need to
 save the session id and password to a persistent store (a file) when you
 first attach, and then when you restart read the id and password from the
 file.

 ben

 Ted Dunning wrote:

 On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:



 There's no way to hand over responsibility for an ephemeral znode,
 right?




 Right.




 We have solr nodes create ephemeral znodes (name based on host and
 port).
 The ephemeral znode takes some time to remove of course, so what
 happens is that if I bounce a solr server (containing a zk client) the
 ephemeral node will still exist when the server comes back up.




 This problem comes up with any system that has hysteresis and needs a
 single
 point of control.




 What's the best way to handle this situation?  Delete and re-create?



 Watch it and re-create when it does disappear?
  I think you need to handle the problem of multiple search nodes coming
 up on
 the same machine, possibly because the old one may have hung up.

 So... I would recommend

 a) if the ephemeral still exists, wait for a few more seconds to see if
 it
 disappears (20?)

 b) if it goes away, create a new one and continue as normal

 c) if it doesn't go away take additional action to determine if service
 is
 still running (i.e. panic and run in circles).






Re: ephemeral node after server bounce

2010-02-04 Thread Ted Dunning
Yes.  Normal exists should be handled.  What Yonik is worried about is
abnormal situations.  Unfortunately, after you remove the semi-orderly
shutdowns, you are left with a lot of residual cases where the timeout is
the most reliable metric that the node has failed.

On Thu, Feb 4, 2010 at 3:04 PM, Patrick Hunt ph...@apache.org wrote:

 Ah, excellent idea [jvm shutdownhooks], won't always work but may help. I
 think in this case (ephemerals) all Yonik would need to do is close the
 session. That will remove all ephemerals.




-- 
Ted Dunning, CTO
DeepDyve