Hi!

>From this log [1] I see that local node (id=06b8bcb7) reported 18 node
failed events 1 for server node and 17 for clients. Communication
sub-system closes incoming connections from nodes that are not in topology.

We need to understand:
1. Why this node failed? Probably it started segmentation process and
should have been shut down soon. Can you grep all logs for "SEGMENTED"
string?

2018/05/24 18:55:21.666 [WARN
][disco-event-worker-#61][GridDiscoveryManager] Node FAILED:
TcpDiscoveryNode [id=a0533414-4d5d-4fa0-9555-9a893e865de9,
addrs=[10.0.0.14, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
CDNode00000I.hlbdeyzzwm2ujgdsre0nhzw3sg.dx.internal.cloudapp.net/10.0.0.14:47500],
discPort=47500, order=5, intOrder=5, lastExchangeTime=1526495846007,
loc=false, ver=2.5.0#20180511-sha1:89c77573, isClient=false]

2. Why clients connected to that server node did not reconnect to another
one? Do you see consistent NODE FAILED events and consistent topology
version among all nodes in topology?

Can you share the logs from the entire topology? I think this will help a
lot with investigations.

[1]
http://apache-ignite-users.70518.x6.nabble.com/file/t1784/ignite-node04.log

--Yakov

2018-05-29 22:18 GMT+03:00 crenique <creni...@gmail.com>:

> Hi,
>  We have been running ignite v2.5.0 snapshot (2018/5/11) fine for about two
> weeks.
> But suddenly, the grid stopped responding with massive infinite log spams,
> /
> 2018/05/24 18:56:14.909 [INFO
> ][grid-nio-worker-tcp-comm-5-#46][TcpCommunicationSpi] Accepted incoming
> communication connection [locAddr=/10.0.0.38:47102,
> rmtAddr=/10.0.0.29:53819]
> 2018/05/24 18:56:14.910 [WARN
> ][grid-nio-worker-tcp-comm-5-#46][TcpCommunicationSpi] Close incoming
> connection, unknown node [nodeId=2f4ed0a7-cf1f-4ad1-a6eb-e4171985eb97,
> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=5, bytesRcvd=39218524780,
> bytesSent=34572936118, bytesRcvd0=159844, bytesSent0=273948, select=true,
> super=GridWorker [name=grid-nio-worker-tcp-comm-5,
> igniteInstanceName=null,
> finished=false, hashCode=1499496401, interrupted=false,
> runner=grid-nio-worker-tcp-comm-5-#46]]],
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
> readBuf=java.nio.DirectByteBuffer[pos=42 lim=42 cap=32768],
> inRecovery=null,
> outRecovery=null, super=GridNioSessionImpl [locAddr=/10.0.0.38:47102,
> rmtAddr=/10.0.0.29:53819, createTime=1527188174903, closeTime=0,
> bytesSent=18, bytesRcvd=42, bytesSent0=18, bytesRcvd0=42,
> sndSchedTime=1527188174903, lastSndTime=1527188174903,
> lastRcvTime=1527188174903, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=o.a.i.i.util.nio.GridDirectParser@60af162b, directMode=true],
> GridConnectionBytesVerifyFilter], accepted=true]]]
> 2018/05/24 18:56:15.006 [WARN
> ][sys-stripe-14-#15][GridDhtPartitionTopologyImpl] Requested topology
> version does not match calculated diff, will require full iteration
> tocalculate mapping [grp=xxxx, topVer=AffinityTopologyVersion [topVer=121,
> minorTopVer=0], diffVer=AffinityTopologyVersion [topVer=138,
> minorTopVer=0]]/
>
>
>  It appears that ignite server in node04 was causing it. So after killed
> the
> specific ignite server instance, the grid stopped spamming errors.
> Please see attached ignite grid configuration and logs.
>
> Can you please provide any tips or information to track down what could
> trigger this problem and fix it ?
>
>
> *Jvm options*
> "-Duser.timezone=UTC",
> "-DIGNITE_QUIET=false",
> "-Djava.net.preferIPv4Stack=true",
> "-Djava.awt.headless=true",
> "-Xms10g",
> "-Xmx10g",
> "-XX:+AlwaysPreTouch",
> "-XX:+UseG1GC",
> "-XX:+ScavengeBeforeFullGC",
> "-XX:+DisableExplicitGC"
>
>
> *Ignite configs*
> Topology snapshot [ver=120, servers=40, clients=80, CPUs=640,
> offheap=480.0GB, heap=560.0GB]
> Default_Region [initSize=2.0 GiB, maxSize=4.0 GiB,
> persistenceEnabled=false]
>
> ignite-config.xml
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1784/ignite-config.xml>
> ignite-cache-config.txt
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1784/ignite-cache-config.txt>
>
>
> *Logs*
> ignite-startup-logs.txt
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1784/ignite-startup-logs.txt>
> ignite-node04.log
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1784/ignite-node04.log>
> ignite-node18.log
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1784/ignite-node18.log>
>
>
>
> Thanks
> Sam
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to