Hi all,
Yesterday we had some network issues and we observed the server log having
messages like :
2020-06-07T21:46:35,368 DEBUG c.c.p.c.c.p.d.CustomTcpDiscoverySpi
[tcp-disco-msg-worker-#2]: Received metrics from unknown node:
8baf933f-e2cc-43be-818c-de2fb1259194
Once we figured which client node this consistent id belonged to, we saw
this message on the client node logs. Please note this is the last 'ignite'
message in the logs on the client node :
2020-06-07T16:06:32,961 WARN c.c.p.c.c.p.d.CustomTcpDiscoverySpi
[tcp-client-disco-msg-worker-#4%instancename%]: Local node was dropped from
cluster due to network problems, will try to reconnect with new id after
10000ms (reconnect delay can be changed using
IGNITE_DISCO_FAILED_CLIENT_RECONNECT_DELAY system property)
[newId=8baf933f-e2cc-43be-818c-de2fb1259194,
prevId=d7674f40-6112-46a6-83f8-15656b01c66b, locNode=TcpDiscoveryNode
[id=d7674f40-6112-46a6-83f8-15656b01c66b, addrs=[0:0:0:0:0:0:0:1%lo,
a.b.c.d, 127.0.0.1, 192.168.61.150], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0,
/127.0.0.1:0, hostname.companyname.local/a.b.c.d:0,
multicast/192.168.61.150:0], discPort=0, order=193, intOrder=0,
lastExchangeTime=1591527983474, loc=true, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=true], nodeInitiatedFail=c67403fd-812b-46b4-9c76-60f5052b57d7,
msg=Client node considered as unreachable and will be dropped from cluster,
because no metrics update messages received in interval:
TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
network problems or long GC pause on client node, try to increase this
parameter. [nodeId=d7674f40-6112-46a6-83f8-15656b01c66b,
clientFailureDetectionTimeout=30000]]
My questions are below :
1. If there are no metrics received, should the node not segmented.
2. This node did not get segmented. It took on a new node id. But the nodeid
was not registered in the cluster ?( as per the code in updateMetrics in
ServerImpl.java - snapshot below . )
3. How do we monitor the cluster topology for these kind of scenarios ?
===========================================
private void updateMetrics(UUID nodeId,
ClusterMetrics metrics,
Map<Integer, CacheMetrics> cacheMetrics,
long tsNanos)
{
assert nodeId != null;
assert metrics != null;
TcpDiscoveryNode node = ring.node(nodeId);
if (node != null) {
..........
}
else if (log.isDebugEnabled())
log.debug("Received metrics from unknown node: " + nodeId);
}
}
regards,
Veena.
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/