Re: Node failure handling semantics

npordash Fri, 04 May 2018 16:55:51 -0700

Thanks Val

I'd like to make sure I understand this correctly. Let's say we have a ring
of nodes A <- B <- C <- D <- A.


If B is unhealthy then C won't see a heartbeat within the configured failure
detection time and will then proceed to connect to A. When this happens, how
is B's ejection coordinated across the cluster? Or does it even need to be?
I know at some point all nodes will log that B failed.

Now let's say B has been ejected but is now recovered (f.e. network
restored, GC pause passed, etc). How does it know it's been ejected? I think
at this point it will now think A has failed because it hasn't received a
heartbeat for it since B itself was unavailable and it may not be aware of
that and may try to start an ejection process for node A. How is this
situation handled?

-Nick



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Node failure handling semantics

Reply via email to