Re: Local node seems to be disconnected from topology (failure detection timeout is reached)

Vladislav Pyatkov Fri, 26 Aug 2016 06:31:06 -0700

Hello,

I recoment to use failureDetectionTimeout only for test, because this
timeout determines all timeout for all SPI (Communication, Dicovery).
Use specific timeout of DiscoverySPI for prevention of segmentation
(org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi#setSocketTimeout).


The configuration lead to slow change of topology in case failure of node.

On Fri, Aug 26, 2016 at 2:49 PM, yucigou <[email protected]> wrote:

> The root cause of the problem just found is that the VMs are frozen
> sometimes.
>
> Our service team takes backup of the VMs once per day. During the backup,
> the VMs that our application servers are running on would be frozen for a
> few seconds usually, but sometimes more than 40 seconds! When I say a VM is
> frozen here, I mean it is frozen literally, and nothing is going to run
> during this period of time.
>
> So when one VM is frozen, the other Ignite node will consider it is down,
> and as a result, the node on the frozen VM is disconnected with topology
> segmented, etc.
>
> So the solution seems to be set the failureDetectionTimeout property to 60
> seconds, to tolerate the VM being frozen in its worst cases.
>
> My question is, would there be some side effects to set
> failureDetectionTimeout 60 seconds? Any advice in such a situation? Thank
> you.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Local-node-seems-to-be-disconnected-
> from-topology-failure-detection-timeout-is-reached-tp6797p7347.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Vladislav Pyatkov

Re: Local node seems to be disconnected from topology (failure detection timeout is reached)

Reply via email to