The root cause of the problem just found is that the VMs are frozen sometimes.
Our service team takes backup of the VMs once per day. During the backup, the VMs that our application servers are running on would be frozen for a few seconds usually, but sometimes more than 40 seconds! When I say a VM is frozen here, I mean it is frozen literally, and nothing is going to run during this period of time. So when one VM is frozen, the other Ignite node will consider it is down, and as a result, the node on the frozen VM is disconnected with topology segmented, etc. So the solution seems to be set the failureDetectionTimeout property to 60 seconds, to tolerate the VM being frozen in its worst cases. My question is, would there be some side effects to set failureDetectionTimeout 60 seconds? Any advice in such a situation? Thank you. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797p7347.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
