As an Ignite user, here are my two cents: - if you were never able to get the node to join the cluster, check that there are no firewalls/rules blocking the Ignite ports (telnet might be a quick way to do that) - check that the IPs printed by TcpDiscoverySpi are the correct ones; if you have virtual network adapters enabled then the wrong IP might be chosen, so the IP discovery will fail. This can happen if you use VirtualBox or Docker, for instance. - for intermittent issues, you can try increasing the default failure detection timeout, which is 10s, I think. Somewhere in the Ignite doc it's recommended to use 30s if the JVM is on AWS. - how did you configure IP discovery? In my case, I've always used static IP discovery with shared enabled - TcpDiscoveryVmIpFinder
On Sun, Jan 6, 2019 at 6:04 AM Prasad Bhalerao <[email protected]> wrote: > Hi, > > I am consistently getting "Node is out of topology" message in logs on > node-1 and in other node, node-2 getting message "Timed out waiting for > message delivery receipt (most probably, the reason is in long GC pauses on > remote node; consider tuning GC and increasing '" > > I have checked the network bandwidth using iperf and it is 470 Mbit per > sec. I have also checked the gc logs and max pause time is 140 ms. > > If it is really happening because of network issues, it there any way to > debug it? > > If it is happening because of gc, I would have seen it in gc logs. > > Can someone please help me out with this? > > Log messages on node-1: > 2019-01-06 13:48:19,036 125016 [tcp-disco-srvr-#3%springDataNode%] INFO > o.a.i.s.d.tcp.TcpDiscoverySpi - TCP discovery accepted incoming connection > [rmtAddr=/10.114.113.65, rmtPort=35651] > 2019-01-06 13:48:19,037 125017 [tcp-disco-srvr-#3%springDataNode%] INFO > o.a.i.s.d.tcp.TcpDiscoverySpi - TCP discovery spawning a new thread for > connection [rmtAddr=/10.114.113.65, rmtPort=35651] > 2019-01-06 13:48:19,037 125017 [tcp-disco-sock-reader-#5%springDataNode%] > INFO o.a.i.s.d.tcp.TcpDiscoverySpi - Started serving remote node > connection [rmtAddr=/10.114.113.65:35651, rmtPort=35651] > *2019-01-06 13:48:19,040 125020 [tcp-disco-msg-worker-#2%springDataNode%] > WARN o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, > due to short-time network problems).* > 2019-01-06 13:48:19,041 125021 [disco-event-worker-#62%springDataNode%] > WARN o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED: > TcpDiscoveryNode [id=a5827f51-096a-4c98-af4f-564d2d3e769d, > addrs=[10.114.113.53, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, > qagmscore02.p13.eng.in03.qualys.com/10.114.113.53:47500], discPort=47500, > order=2, intOrder=2, lastExchangeTime=1546782499034, loc=true, > ver=2.7.0#20181130-sha1:256ae401, isClient=false] > 2019-01-06 13:48:19,041 125021 [tcp-disco-sock-reader-#5%springDataNode%] > INFO o.a.i.s.d.tcp.TcpDiscoverySpi - Finished serving remote node > connection [rmtAddr=/10.114.113.65:35651, rmtPort=35651 > 2019-01-06 13:48:19,866 125846 [tcp-comm-worker-#1%springDataNode%] INFO > o.a.i.s.d.tcp.TcpDiscoverySpi - Pinging node: > cd9803ac-b810-447e-818e-ab51dada59d8 > >
