Node failure handling semantics

npordash Thu, 26 Apr 2018 17:28:17 -0700

Hi,

I was wondering if there is any additional documentation on how Ignite
internally handles node failures? The section in the documentation kind of
skims over this too quickly[1].


I specifically have the following inquiries:

1) Does each node in the ring send heartbeats to all other nodes in the ring
or only to it's neighbor(s)?
2) If each node is sending heartbeats to all other nodes then how is
ejection of a node from the cluster coordinated? It could be possible that
some nodes in the cluster detect the failure while others don't.
3) If a node is only sending heartbeats to neighbor(s) for failure detection
then how would cluster ejection be determined given that you could have
healthy nodes in the ring between un-healthy nodes?

I understand there are split-brain resolvers, but given that there are no
out-of-the-box implementations in the OSS version I'm curious what the
expected behavior is here without one.

Thanks!
-Nick

[1]
https://apacheignite.readme.io/docs/cluster-config#section-failure-detection-timeout



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Node failure handling semantics

Reply via email to