Hi, Try to disable IPv6 on all nodes via JVM option -Djava.net.preferIPv4Stack=true [1] as using both IPv4 and IPv6 can cause grid segmentation.
[1] https://stackoverflow.com/questions/11850655/how-can-i-disable-ipv6-stack-use-for-ipv4-ips-on-jre On Fri, Apr 27, 2018 at 8:52 AM, naresh.goty <[email protected]> wrote: > Hi, > > We are running apache ignite (v2.3) in embedded mode in a java based > application with 9 node cluster in our production environment in AWS cloud > infrastructure. > > Most of the time, we don't see any issue with node communication failure, > but occasionally we find one of the node failure reporting the below error > message. > > WARNING: Node is out of topology (probably, due to short-time network > problems). > Apr 16, 2018 5:19:24 AM org.apache.ignite.logger.java.JavaLogger warning > WARNING: Local node SEGMENTED: TcpDiscoveryNode > [id=13b6f3ec-a759-408f-9d3f-62f2381c649b, addrs=[0:0:0:0:0:0:0:1%lo, > 10.40.173.93, 127.0.0.1], sockAddrs=[/10.40.173.93:47500, > /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=157, > intOrder=83, lastExchangeTime=1523855964541, loc=true, > ver=2.3.0#20171028-sha1:8add7fd5, isClient=false] > > Our analysis so far: > 1) We are constantly monitoring the GC activities of the node, and can > confirm that there is no long GC pauses occurred during the time frame of > the node failure. > > 2) There is also no abnormal network spikes reported in AWS instance > monitors as well. > > 3) CPU utilization on the affected node is low. No blocked threads reported > from thread dumps. > > Attached Tomcat Logs of two nodes from the cluster of 9 > TomcatLogs_Node1: provided log details of Network Segmentation failure > TomcatLogs_Node2: other node provided log info of discovery message > ApplicationLogs_Node1: Detailed logs of Node stopping exceptions > Two thread dumps > > Could some one provide any insights on how to trace the root cause of this > issue and to prevent this issue from happening again? > > Thanks > Naresh > > > TomcatLog_Node1.txt > <http://apache-ignite-users.70518.x6.nabble.com/file/ > t1286/TomcatLog_Node1.txt> > TomcatLog_Node2.txt > <http://apache-ignite-users.70518.x6.nabble.com/file/ > t1286/TomcatLog_Node2.txt> > ApplicationLog_Node1.txt > <http://apache-ignite-users.70518.x6.nabble.com/file/ > t1286/ApplicationLog_Node1.txt> > threaddump_1.threaddump_1 > <http://apache-ignite-users.70518.x6.nabble.com/file/ > t1286/threaddump_1.threaddump_1> > threaddump_2.threaddump_2 > <http://apache-ignite-users.70518.x6.nabble.com/file/ > t1286/threaddump_2.threaddump_2> > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- Best regards, Andrey V. Mashenkov
