Thanks Val I'd like to make sure I understand this correctly. Let's say we have a ring of nodes A <- B <- C <- D <- A.
If B is unhealthy then C won't see a heartbeat within the configured failure detection time and will then proceed to connect to A. When this happens, how is B's ejection coordinated across the cluster? Or does it even need to be? I know at some point all nodes will log that B failed. Now let's say B has been ejected but is now recovered (f.e. network restored, GC pause passed, etc). How does it know it's been ejected? I think at this point it will now think A has failed because it hasn't received a heartbeat for it since B itself was unavailable and it may not be aware of that and may try to start an ejection process for node A. How is this situation handled? -Nick -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
