A follow-up question about recovery.

Node ves-hx-40 was frozen for about a minute due to VM backup, and was
considered failed by the cluster.

Then ves-hx-40 woke up after the VM backup, and found itself being
disconnected from topoloyg (see below the logs). It then stopped itself.

[23:34:03,752][INFO ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Local
node seems to be disconnected from topology (failure detection timeout is
reached) [failureDetectionTimeout=10000, connCheckFreq=3333] 
[23:34:03,783][WARN ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node
is out of topology (probably, due to short-time network problems). 
[23:34:03,786][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=9a069f70-d49d-472e-9771-7ac2353e751f, addrs=[10.3.0.64, 127.0.0.1],
sockAddrs=[ves-hx-40.ebi.ac.uk/10.3.0.64:47500, /10.3.0.64:47500,
/127.0.0.1:47500], discPort=47500, order=56, intOrder=29,
lastExchangeTime=1470350043783, loc=true, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false] 
[23:34:03,819][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Stopping local node according to configured segmentation policy. 

I understand that in such situations Apache Ignite would stop the local node
according to the segmentation policy.

My question is, why Apache Ignite does not give an option to try to
reconnect to the cluster, in stead of just stopping the local node (or doing
nothing, or restart JVM)? 

I think it is a reasonable policy option, that is, to regard the
disconnected local node as a new potential member of the cluster, clear all
of its local caches and states, and then rejoin the cluster.

Thanks.

Yuci



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797p10386.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to