Hi,

We are running apache ignite (v2.3) in embedded mode in a java based
application with 9 node cluster in our production environment in AWS cloud
infrastructure. 

Most of the time, we don't see any issue with node communication failure,
but occasionally we find one of the node failure reporting the below error
message.

WARNING: Node is out of topology (probably, due to short-time network
problems).
Apr 16, 2018 5:19:24 AM org.apache.ignite.logger.java.JavaLogger warning
WARNING: Local node SEGMENTED: TcpDiscoveryNode
[id=13b6f3ec-a759-408f-9d3f-62f2381c649b, addrs=[0:0:0:0:0:0:0:1%lo,
10.40.173.93, 127.0.0.1], sockAddrs=[/10.40.173.93:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=157,
intOrder=83, lastExchangeTime=1523855964541, loc=true,
ver=2.3.0#20171028-sha1:8add7fd5, isClient=false]

Our analysis so far: 
1) We are constantly monitoring the GC activities of the node, and can
confirm that there is no long GC pauses occurred during the time frame of
the node failure.

2) There is also no abnormal network spikes reported in AWS instance
monitors as well.

3) CPU utilization on the affected node is low. No blocked threads reported
from thread dumps.

Attached Tomcat Logs of two nodes from the cluster of 9
TomcatLogs_Node1: provided log details of Network Segmentation failure
TomcatLogs_Node2: other node provided log info of discovery message
ApplicationLogs_Node1: Detailed logs of Node stopping exceptions
Two thread dumps

Could some one provide any insights on how to trace the root cause of this
issue and to prevent this issue from happening again?

Thanks
Naresh


TomcatLog_Node1.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1286/TomcatLog_Node1.txt> 
 
TomcatLog_Node2.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1286/TomcatLog_Node2.txt> 
 
ApplicationLog_Node1.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1286/ApplicationLog_Node1.txt>
  
threaddump_1.threaddump_1
<http://apache-ignite-users.70518.x6.nabble.com/file/t1286/threaddump_1.threaddump_1>
  
threaddump_2.threaddump_2
<http://apache-ignite-users.70518.x6.nabble.com/file/t1286/threaddump_2.threaddump_2>
  





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to