Hi, I was wondering if there is any additional documentation on how Ignite internally handles node failures? The section in the documentation kind of skims over this too quickly[1].
I specifically have the following inquiries: 1) Does each node in the ring send heartbeats to all other nodes in the ring or only to it's neighbor(s)? 2) If each node is sending heartbeats to all other nodes then how is ejection of a node from the cluster coordinated? It could be possible that some nodes in the cluster detect the failure while others don't. 3) If a node is only sending heartbeats to neighbor(s) for failure detection then how would cluster ejection be determined given that you could have healthy nodes in the ring between un-healthy nodes? I understand there are split-brain resolvers, but given that there are no out-of-the-box implementations in the OSS version I'm curious what the expected behavior is here without one. Thanks! -Nick [1] https://apacheignite.readme.io/docs/cluster-config#section-failure-detection-timeout -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
