Hello! Looks like some TCP communication issues between two nodes. Connection was unexpectedly closed between nodes and could not be established properly.
Maybe there's more, but the first recommendation will be to upgrade to newer version, see if the problem goes away. Regards, -- Ilya Kasnacheev чт, 6 июн. 2019 г. в 19:33, Nikita Kuzin <[email protected]>: > Hi! > Ignite version is 2.4 > We have 2 server nodes and 4 client nodes. > > Suddenly on one client node we have repeating messages: > > 2019.05.31 11:19:57.066 INFO > > [grid-nio-worker-tcp-comm-2-#59%ITLXCLKICP-client6fdfcde3-3e30-406f-8272-cb61a8a4cc80%] > TcpCommunicationSpi - Accepted incoming communication connection > [locAddr=/172.25.202.221:47100, rmtAddr=/172.16.60.25:53119] > 2019.05.31 11:19:57.066 INFO > > [grid-nio-worker-tcp-comm-2-#59%ITLXCLKICP-client6fdfcde3-3e30-406f-8272-cb61a8a4cc80%] > TcpCommunicationSpi - Received incoming connection when already connected > to > this node, rejecting [locNode=bc15d471-24e3-4d06-b57f-8b269770cc73, > rmtNode=fe0dbc6e-4f44-4512-ba44-ddb56a92e644] > 2019.05.31 11:19:58.066 ERROR > > [grid-nio-worker-tcp-comm-0-#57%ITLXCLKICP-client6fdfcde3-3e30-406f-8272-cb61a8a4cc80%] > TcpCommunicationSpi - Failed to process selector key > [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=0, bytesRcvd=169756864562, > bytesSent=391832525, bytesRcvd0=65536, bytesSent0=10, select=true, > super=GridWorker [name=grid-nio-worker-tcp-comm-0, > igniteInstanceName=ITLXCLKICP-client6fdfcde3-3e30-406f-8272-cb61a8a4cc80, > finished=false, hashCode=1395033804, interrupted=false, > > runner=grid-nio-worker-tcp-comm-0-#57%ITLXCLKICP-client6fdfcde3-3e30-406f-8272-cb61a8a4cc80%]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=2326912, resendCnt=0, > rcvCnt=609263944, sentCnt=2326926, reserved=true, lastAck=609263936, > nodeLeft=false, node=TcpDiscoveryNode > [id=fe0dbc6e-4f44-4512-ba44-ddb56a92e644, addrs=[0:0:0:0:0:0:0:1, > 127.0.0.1, > 172.16.60.25], > sockAddrs=[itlxclkapr.moscow.alfaintra.net/172.16.60.25:47500, > /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500], discPort=47500, order=14, > intOrder=9, lastExchangeTime=1556085893170, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=2326912, resendCnt=0, > rcvCnt=609263944, sentCnt=2326926, reserved=true, lastAck=609263936, > nodeLeft=false, node=TcpDiscoveryNode > [id=fe0dbc6e-4f44-4512-ba44-ddb56a92e644, addrs=[0:0:0:0:0:0:0:1, > 127.0.0.1, > 172.16.60.25], > sockAddrs=[itlxclkapr.moscow.alfaintra.net/172.16.60.25:47500, > /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500], discPort=47500, order=14, > intOrder=9, lastExchangeTime=1556085893170, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/172.25.202.221:51014, > rmtAddr=itlxclkapr.moscow.alfaintra.net/172.16.60.25:47100, > createTime=1556085893592, closeTime=0, bytesSent=391832525, > bytesRcvd=169756864562, bytesSent0=10, bytesRcvd0=65536, > sndSchedTime=1556085893592, lastSndTime=1559290797066, > lastRcvTime=1559290797066, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=o.a.i.i.util.nio.GridDirectParser@607def96, directMode=true], > GridConnectionBytesVerifyFilter], accepted=false]]] > java.io.IOException: An established connection was aborted by the software > in your host machine > at sun.nio.ch.SocketDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite0(GridNioServer.java:1636) > at > > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1293) > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2307) > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2080) > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1749) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > > > and on only one server node we have this repeating messages: > > > [11:19:32,035][INFO][grid-timeout-worker-#55%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][IgniteKernal%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd] > FreeList [name=ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd, > buckets=256, dataPages=2714287, reusePages=3] > > [11:19:53,458][WARNING][grid-nio-worker-tcp-comm-7-#64%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][TcpCommunicationSpi] > Communication SPI session write timed out (consider increasing > 'socketWriteTimeout' configuration property) > [remoteAddr=/172.25.202.221:51014, writeTimeout=2000] > > [11:19:57,740][INFO][grid-nio-worker-tcp-comm-1-#58%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/172.16.60.25:47100, > rmtAddr=/172.25.202.221:59201] > > [11:19:57,740][INFO][grid-nio-worker-tcp-comm-1-#58%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][TcpCommunicationSpi] > Received incoming connection from remote node while connecting to this > node, > rejecting [locNode=fe0dbc6e-4f44-4512-ba44-ddb56a92e644, locNodeOrder=14, > rmtNode=bc15d471-24e3-4d06-b57f-8b269770cc73, rmtNodeOrder=22] > > [11:19:57,756][INFO][grid-nio-worker-tcp-comm-2-#59%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][TcpCommunicationSpi] > Established outgoing communication connection [locAddr=/172.16.60.25:53124 > , > rmtAddr=itlxclkicp.moscow.alfaintra.net/172.25.202.221:47100] > [11:20:32,039][INFO][grid-timeout-worker-#55%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][IgniteKernal%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd] > > > after ~12 minutes > > server node print: > > > [11:35:57,434][WARNING][grid-timeout-worker-#55%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd%][G] > >>> Possible starvation in striped pool. > Thread name: > sys-stripe-16-#17%ITLXCLKAPR-server75f5b5e1-b80c-4592-82cd-9748aa693ebd% > .....(very long message) > > and then cluster is totally broken and operable > > What can cause this? And how we can prevent it? > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
