ZooKeeper considers a client dead when it hasn't heard from that client during the timeout period. clients make sure to communicate with ZooKeeper at least once in 1/3 the timeout period. if the client doesn't hear from ZooKeeper in 2/3 the timeout period, the client will issue a ConnectionLoss event and cause outstanding requests to fail with a ConnectionLoss.

So, if ZooKeeper decides a process is dead, the process will get a ConnectionLoss event. Once ZooKeeper decides that a client is dead, if the client reconnects, the client will get a SessionExpired. Once a session is expired, the expired handle will become useless, so no new requests, no watches, etc.

The bottom line is if your process gets a process expired, you need to treat that process as expired and recover by creating a new zookeeper handle (possibly by restarting the process) and resetup your state.


On 10/12/2010 09:54 AM, Avinash Lakshman wrote:
This is what I have going:

I have a bunch of 200 nodes come up and create an ephemeral entry under a
znode names /Membership. When nodes are detected dead the node associated
with the dead node under /Membership is deleted and watch delivered to the
rest of the members. Now there are circumstances a node A is deemed dead
while the process is still up and running on A. It is a false detection
which I need to probably deal with. How do I deal with this situation?  Over
time false detections delete all the entries underneath the /Membership
znode even though all processes are up and running.

So my questions are:
Would the watches be pushed out to the node that is falsely deemed dead? If
so I can have that process recreate the ephemeral znode underneath
If a node leaves a watch and then truly crashes. When it comes back up would
it get watches it missed during the interim period? In any case how do
watches behave in the event of false/true failure detection?


Reply via email to