The root cause of the problem just found is that the VMs are frozen
sometimes.

Our service team takes backup of the VMs once per day. During the backup,
the VMs that our application servers are running on would be frozen for a
few seconds usually, but sometimes more than 40 seconds! When I say a VM is
frozen here, I mean it is frozen literally, and nothing is going to run
during this period of time.

So when one VM is frozen, the other Ignite node will consider it is down,
and as a result, the node on the frozen VM is disconnected with topology
segmented, etc.

So the solution seems to be set the failureDetectionTimeout property to 60
seconds, to tolerate the VM being frozen in its worst cases. 

My question is, would there be some side effects to set
failureDetectionTimeout 60 seconds? Any advice in such a situation? Thank
you.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797p7347.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to