Re: How to debug network issues in cluster

Loredana Radulescu Ivanoff Mon, 07 Jan 2019 09:29:15 -0800

As an Ignite user, here are my two cents:

- if you were never able to get the node to join the cluster, check that
there are no firewalls/rules blocking the Ignite ports (telnet might be a
quick way to do that)
- check that the IPs printed by TcpDiscoverySpi are the correct ones; if
you have virtual network adapters enabled then the wrong IP might be
chosen, so the IP discovery will fail. This can happen if you use
VirtualBox or Docker, for instance.
- for intermittent issues, you can try increasing the default failure
detection timeout, which is 10s, I think. Somewhere in the Ignite doc it's
recommended to use 30s if the JVM is on AWS.
- how did you configure IP discovery? In my case, I've always used static
IP discovery with shared enabled - TcpDiscoveryVmIpFinder


On Sun, Jan 6, 2019 at 6:04 AM Prasad Bhalerao <[email protected]>
wrote:

> Hi,
>
> I am consistently getting "Node is out of topology" message in logs on
> node-1 and in other node, node-2 getting message "Timed out waiting for
> message delivery receipt (most probably, the reason is in long GC pauses on
> remote node; consider tuning GC and increasing '"
>
> I have checked the network bandwidth using iperf and it is 470 Mbit per
> sec. I have also checked the gc logs and max pause time is 140 ms.
>
> If it is really happening because of network issues, it there any way to
> debug it?
>
> If it is happening because of gc, I would have seen it in gc logs.
>
> Can someone please help me out with this?
>
> Log messages on node-1:
> 2019-01-06 13:48:19,036 125016 [tcp-disco-srvr-#3%springDataNode%] INFO
> o.a.i.s.d.tcp.TcpDiscoverySpi - TCP discovery accepted incoming connection
> [rmtAddr=/10.114.113.65, rmtPort=35651]
> 2019-01-06 13:48:19,037 125017 [tcp-disco-srvr-#3%springDataNode%] INFO
> o.a.i.s.d.tcp.TcpDiscoverySpi - TCP discovery spawning a new thread for
> connection [rmtAddr=/10.114.113.65, rmtPort=35651]
> 2019-01-06 13:48:19,037 125017 [tcp-disco-sock-reader-#5%springDataNode%]
> INFO  o.a.i.s.d.tcp.TcpDiscoverySpi - Started serving remote node
> connection [rmtAddr=/10.114.113.65:35651, rmtPort=35651]
> *2019-01-06 13:48:19,040 125020 [tcp-disco-msg-worker-#2%springDataNode%]
> WARN  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably,
> due to short-time network problems).*
> 2019-01-06 13:48:19,041 125021 [disco-event-worker-#62%springDataNode%]
> WARN  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED:
> TcpDiscoveryNode [id=a5827f51-096a-4c98-af4f-564d2d3e769d,
> addrs=[10.114.113.53, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
> qagmscore02.p13.eng.in03.qualys.com/10.114.113.53:47500], discPort=47500,
> order=2, intOrder=2, lastExchangeTime=1546782499034, loc=true,
> ver=2.7.0#20181130-sha1:256ae401, isClient=false]
> 2019-01-06 13:48:19,041 125021 [tcp-disco-sock-reader-#5%springDataNode%]
> INFO  o.a.i.s.d.tcp.TcpDiscoverySpi - Finished serving remote node
> connection [rmtAddr=/10.114.113.65:35651, rmtPort=35651
> 2019-01-06 13:48:19,866 125846 [tcp-comm-worker-#1%springDataNode%] INFO
> o.a.i.s.d.tcp.TcpDiscoverySpi - Pinging node:
> cd9803ac-b810-447e-818e-ab51dada59d8
>
>

Re: How to debug network issues in cluster

Reply via email to