Re: Local node SEGMENTED error causing node goes down for no obvious reason

Yakov Zhdanov Mon, 14 Jan 2019 14:11:37 -0800

It seems there were issues with network. You can see plenty of discovery
warnings in logs of the kind (231.log):


[2018-11-07T07:44:44,627][WARN ][grid-timeout-worker-#119][TcpDiscoverySpi]
Socket write has timed out (consider increasing
'IgniteConfiguration.failureDetectionTimeout' configuration property)
[failureDetectionTimeout=60000, rmtAddr=/10.29.42.232:49500, rmtPort=49500,
sockTimeout=5000]
[2018-11-07T07:44:44,630][WARN ][tcp-disco-msg-worker-#3][TcpDiscoverySpi]
Failed to send message to next node [msg=TcpDiscoveryClientReconnectMessage
[routerNodeId=9a4ee928-a71d-484b-88cc-2ded8efb7b1d, lastMsgId=null,
super=TcpDiscoveryAbstractMessage [sndNodeId=null,
id=005b10de661-88aac721-2b85-432b-b703-ca6aff5252c6,
verifierNodeId=a7685ff7-78b6-442c-a819-f8a5b2547623, topVer=0,
pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode
[id=e940c0d9-15f7-46ab-be95-ee2302ccc8f4, addrs=[10.29.42.232], sockAddrs=[/
10.29.42.232:49500], discPort=49500, order=2, intOrder=2,
lastExchangeTime=1541574587361, loc=false,
ver=2.6.0#20180709-sha1:5faffcee, isClient=false], errMsg=Failed to send
message to next node [msg=TcpDiscoveryClientReconnectMessage
[routerNodeId=9a4ee928-a71d-484b-88cc-2ded8efb7b1d, lastMsgId=null,
super=TcpDiscoveryAbstractMessage [sndNodeId=null,
id=005b10de661-88aac721-2b85-432b-b703-ca6aff5252c6,
verifierNodeId=a7685ff7-78b6-442c-a819-f8a5b2547623, topVer=0,
pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode
[id=e940c0d9-15f7-46ab-be95-ee2302ccc8f4, order=2, addr=[10.29.42.232],
daemon=false]]]

Node with order = 1 (231 one) kicked other server nodes out of the topology
for some reason we need to figure out.

What environment do you run Ignite on?

One more strange thing I see in logs is this

[2018-11-07T07:24:55,542][INFO
][tcp-disco-sock-reader-#146][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.29.42.231:28977, rmtPort=28977]
[2018-11-07T07:24:55,547][INFO ][exchange-worker-#162][time] Started
exchange init [topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0],
crd=true, evt=NODE_JOINED, evtNode=23c738f5-fbbf-44dc-a5fb-5d09933d9c4b,
customEvt=null, allowMerge=true]

But I do not see Node Added event log for node with this ID. Message should
be like this;
[2018-11-07T07:20:56,050][INFO
][disco-event-worker-#161][GridDiscoveryManager] Added new node to
topology: TcpDiscoveryNode [id=a4acc241-dfa6-44bd-a62b-ebce2f68d199,
addrs=[10.29.42.49, 127.0.0.1], sockAddrs=[/10.29.42.49:0, /127.0.0.1:0],
discPort=0, order=14, intOrder=13, lastExchangeTime=1541575169947,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=true]

Ilya, canyou please take a look at the logs one more time?

Ray, can you please reproduce the issue with DEBUG turned on for discovery?
Please also fix all the warnings of the kind
[2018-11-07T07:11:37,701][WARN
][disco-event-worker-#161][GridDiscoveryManager] Local node's value of
'java.net.preferIPv4Stack' system property differs from remote node's (all
nodes in topology should have identical value) [locPreferIpV4=null,
rmtPreferIpV4=null, locId8=e940c0d9, rmtId8=7e10c6c4,
rmtAddrs=[sap-zookeeper3/10.29.42.43, /127.0.0.1], rmtNode=ClusterNode
[id=7e10c6c4-3137-4640-a673-71f30d66d0e3, order=8, addr=[10.29.42.43,
127.0.0.1], daemon=false]]

--Yakov

Re: Local node SEGMENTED error causing node goes down for no obvious reason

Reply via email to