Re: How long Ignite retries upon NODE_FAILED events

Evgenii Zhuravlev Mon, 02 Jul 2018 23:43:38 -0700

Can you share the logs?

2018-07-02 20:54 GMT+03:00 HEWA WIDANA GAMAGE, SUBASH <
[email protected]>:


> Ok I did following poc real quick.
>
>
>
> 1.       Two nodes, started. And joined. Topology snapshot servers=2.
>
> 2.       In one node, I blocked the Ignite ports(47500, 47100 etc).
>
> 3.       Then After failureDetecitonTimeout,  it logged NODE_FAILED, and
> Topology snapshot servers=1 in each node.
>
> 4.       Then after 10-15 seconds, I unblock those ports.
>
> 5.       Then after few seconds, both nodes logged, Node joined, and
> topology snapshot server=2
>
>
>
> So it’s the same node, ID, because JVM is still up and running. And looks
> like it doesn’t forget.
>
>
>
> Can this “10-15 seconds” be any time ? Even in 1-2 hours if the node comes
> back, can it rejoin ?
>
>
>
>
>
>
>
>
>
> *From:* Evgenii Zhuravlev [mailto:[email protected]]
> *Sent:* Monday, July 02, 2018 1:25 PM
> *To:* [email protected]
> *Subject:* Re: How long Ignite retries upon NODE_FAILED events
>
>
>
> If cluster already decided that node failed, it will be stopped after it
> will try to reconnect to the cluster with the same id
>
>
>
> 2018-07-02 18:37 GMT+03:00 HEWA WIDANA GAMAGE, SUBASH <
> [email protected]>:
>
> Yes failureDetectionTimeout determines the time it wait to mark a node
> failed. But my question is, after such node failed happened, and then what
> happens when that failed node becomes reachable in the network (less that
> failureDetectionTimeout) ?
>
>
>
> *From:* Evgenii Zhuravlev [mailto:[email protected]]
> *Sent:* Monday, July 02, 2018 11:05 AM
> *To:* [email protected]
> *Subject:* Re: How long Ignite retries upon NODE_FAILED events
>
>
>
> Hi,
>
>
>
> by default, Ignite uses a mechanism, that can be configured using
> failureDetectionTimeout: https://apacheignite.readme.io/v2.
> 5/docs/tcpip-discovery#section-failure-detection-timeout
>
>
>
> Evgenii
>
>
>
> 2018-07-02 16:40 GMT+03:00 HEWA WIDANA GAMAGE, SUBASH <
> [email protected]>:
>
> Hi team,
>
>
>
> For example, let’s say one of the node is not down(JVM is up), but network
> not reachable from/to it. Then rest of the nodes will see  NODE_FAILED and
> started working as normal with reduced cluster size. If that failed node,
> the network from/to it, becomes normal again  after X minutes. Then,
>
> - will other nodes discover them, or will that node be able to figure it
> out ?
>
> - How long X can be at max? Is there max retry or timeout. (I seen
> joinTimeout param in discovery, but that’s seems only applicable for
> startup, like how long it should pause starting the node to let join others)
>
>
>
>
>

Re: How long Ignite retries upon NODE_FAILED events

Reply via email to