Hi, We are running apache ignite (v2.3) in embedded mode in a java based application with 9 node cluster in our production environment in AWS cloud infrastructure.
Most of the time, we don't see any issue with node communication failure, but occasionally we find one of the node failure reporting the below error message. WARNING: Node is out of topology (probably, due to short-time network problems). Apr 16, 2018 5:19:24 AM org.apache.ignite.logger.java.JavaLogger warning WARNING: Local node SEGMENTED: TcpDiscoveryNode [id=13b6f3ec-a759-408f-9d3f-62f2381c649b, addrs=[0:0:0:0:0:0:0:1%lo, 10.40.173.93, 127.0.0.1], sockAddrs=[/10.40.173.93:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=157, intOrder=83, lastExchangeTime=1523855964541, loc=true, ver=2.3.0#20171028-sha1:8add7fd5, isClient=false] Our analysis so far: 1) We are constantly monitoring the GC activities of the node, and can confirm that there is no long GC pauses occurred during the time frame of the node failure. 2) There is also no abnormal network spikes reported in AWS instance monitors as well. 3) CPU utilization on the affected node is low. No blocked threads reported from thread dumps. Attached Tomcat Logs of two nodes from the cluster of 9 TomcatLogs_Node1: provided log details of Network Segmentation failure TomcatLogs_Node2: other node provided log info of discovery message ApplicationLogs_Node1: Detailed logs of Node stopping exceptions Two thread dumps Could some one provide any insights on how to trace the root cause of this issue and to prevent this issue from happening again? Thanks Naresh TomcatLog_Node1.txt <http://apache-ignite-users.70518.x6.nabble.com/file/t1286/TomcatLog_Node1.txt> TomcatLog_Node2.txt <http://apache-ignite-users.70518.x6.nabble.com/file/t1286/TomcatLog_Node2.txt> ApplicationLog_Node1.txt <http://apache-ignite-users.70518.x6.nabble.com/file/t1286/ApplicationLog_Node1.txt> threaddump_1.threaddump_1 <http://apache-ignite-users.70518.x6.nabble.com/file/t1286/threaddump_1.threaddump_1> threaddump_2.threaddump_2 <http://apache-ignite-users.70518.x6.nabble.com/file/t1286/threaddump_2.threaddump_2> -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/