Would my watcher get invoked on this ConnectionLoss event? If so I am
thinking I will check for KeeperState.Disconnected and reset my state. Is my
understanding correct? Please advice.


On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed <br...@yahoo-inc.com> wrote:

>  ZooKeeper considers a client dead when it hasn't heard from that client
> during the timeout period. clients make sure to communicate with ZooKeeper
> at least once in 1/3 the timeout period. if the client doesn't hear from
> ZooKeeper in 2/3 the timeout period, the client will issue a ConnectionLoss
> event and cause outstanding requests to fail with a ConnectionLoss.
> So, if ZooKeeper decides a process is dead, the process will get a
> ConnectionLoss event. Once ZooKeeper decides that a client is dead, if the
> client reconnects, the client will get a SessionExpired. Once a session is
> expired, the expired handle will become useless, so no new requests, no
> watches, etc.
> The bottom line is if your process gets a process expired, you need to
> treat that process as expired and recover by creating a new zookeeper handle
> (possibly by restarting the process) and resetup your state.
> ben
> On 10/12/2010 09:54 AM, Avinash Lakshman wrote:
>> This is what I have going:
>> I have a bunch of 200 nodes come up and create an ephemeral entry under a
>> znode names /Membership. When nodes are detected dead the node associated
>> with the dead node under /Membership is deleted and watch delivered to the
>> rest of the members. Now there are circumstances a node A is deemed dead
>> while the process is still up and running on A. It is a false detection
>> which I need to probably deal with. How do I deal with this situation?
>>  Over
>> time false detections delete all the entries underneath the /Membership
>> znode even though all processes are up and running.
>> So my questions are:
>> Would the watches be pushed out to the node that is falsely deemed dead?
>> If
>> so I can have that process recreate the ephemeral znode underneath
>> /Membership.
>> If a node leaves a watch and then truly crashes. When it comes back up
>> would
>> it get watches it missed during the interim period? In any case how do
>> watches behave in the event of false/true failure detection?
>> Thanks
>> A

Reply via email to