Yes failureDetectionTimeout determines the time it wait to mark a node failed. But my question is, after such node failed happened, and then what happens when that failed node becomes reachable in the network (less that failureDetectionTimeout) ?
From: Evgenii Zhuravlev [mailto:[email protected]] Sent: Monday, July 02, 2018 11:05 AM To: [email protected] Subject: Re: How long Ignite retries upon NODE_FAILED events Hi, by default, Ignite uses a mechanism, that can be configured using failureDetectionTimeout: https://apacheignite.readme.io/v2.5/docs/tcpip-discovery#section-failure-detection-timeout Evgenii 2018-07-02 16:40 GMT+03:00 HEWA WIDANA GAMAGE, SUBASH <[email protected]<mailto:[email protected]>>: Hi team, For example, let’s say one of the node is not down(JVM is up), but network not reachable from/to it. Then rest of the nodes will see NODE_FAILED and started working as normal with reduced cluster size. If that failed node, the network from/to it, becomes normal again after X minutes. Then, - will other nodes discover them, or will that node be able to figure it out ? - How long X can be at max? Is there max retry or timeout. (I seen joinTimeout param in discovery, but that’s seems only applicable for startup, like how long it should pause starting the node to let join others)
