This is what I have going:
I have a bunch of 200 nodes come up and create an ephemeral entry under a
znode names /Membership. When nodes are detected dead the node associated
with the dead node under /Membership is deleted and watch delivered to the
rest of the members. Now there are circumstances a node A is deemed dead
while the process is still up and running on A. It is a false detection
which I need to probably deal with. How do I deal with this situation? Over
time false detections delete all the entries underneath the /Membership
znode even though all processes are up and running.
So my questions are:
Would the watches be pushed out to the node that is falsely deemed dead? If
so I can have that process recreate the ephemeral znode underneath
If a node leaves a watch and then truly crashes. When it comes back up would
it get watches it missed during the interim period? In any case how do
watches behave in the event of false/true failure detection?