Hello! Well, you have an awful lot of pool starvation messages in your server logs. This may suggest problems with network or the load.
You also hit a long GC: [2019-11-22T21:19:45,964][WARN ][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 15252 milliseconds. Which was accompanied by a lot of connection re-establishment: [2019-11-22T21:19:49,142][WARN ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Communication SPI session write timed out (consider increasing 'socketWriteTimeout' configuration property) [remoteAddr=/192.168.199.122:9508, writeTimeout=2000] [2019-11-22T21:19:49,352][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.116:43306] [2019-11-22T21:19:49,353][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.107:44274] [2019-11-22T21:19:49,353][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.57:5130] [2019-11-22T21:19:49,353][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.112:5766] [2019-11-22T21:19:49,353][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.114:64992] [2019-11-22T21:19:49,353][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.108:28644] [2019-11-22T21:19:49,353][INFO ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.124:47200] [2019-11-22T21:19:49,140][INFO ][grid-nio-worker-tcp-comm-1-#201][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/192.168.199.60:52484, rmtAddr=/ 192.168.199.65:47100] [2019-11-22T21:19:49,124][INFO ][grid-nio-worker-tcp-comm-0-#200][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/192.168.199.60:47100, rmtAddr=/ 192.168.199.106:13940] It is recommended to adjust your timeouts (or heap sizes) so that long GC does not cause all communication connections to be severed. Regards, -- Ilya Kasnacheev пн, 25 нояб. 2019 г. в 16:57, ihalilaltun <[email protected]>: > Hi Igniters, > > We had a strange node-down incident after getting following log (we've been > using ignite in production for almost 1 year and we're getting this error > for the first time) > > [2019-11-22T21:19:54,222][INFO > ][grid-nio-worker-tcp-comm-3-#203][TcpCommunicationSpi] Established > outgoing > communication connection [locAddr=/192.168.199.60:43720, > rmtAddr=/192.168.199.222:47100] > > [2019-11-22T21:19:54,230][ERROR][grid-nio-worker-tcp-comm-0-#200][TcpCommunicationSpi] > Caught unhandled exception in NIO worker thread (restart the node). > java.nio.channels.CancelledKeyException: null > at > sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) > ~[?:1.8.0_201] > at > sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:82) > ~[?:1.8.0_201] > at > > java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:204) > ~[?:1.8.0_201] > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997) > ~[ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1794) > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > [ignite-core-2.7.6.jar:2.7.6] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] > > [2019-11-22T21:19:54,343][ERROR][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] > Failed to process selector key [ses=GridSelectorNioSessionImpl > [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, > bytesRcvd=617063277634, bytesSent=8878293076427, bytesRcvd0=107695, > bytesSent0=727192, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null, finished=false, > heartbeatTs=1574457593322, hashCode=1772114147, interrupted=false, > runner=grid-nio-worker-tcp-comm-2-#202]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/192.168.199.60:47100, rmtAddr=/192.168.199.68:62054, > createTime=1574457593307, closeTime=0, bytesSent=38, bytesRcvd=42, > bytesSent0=38, bytesRcvd0=42, sndSchedTime=1574457593322, > lastSndTime=1574457593322, lastRcvTime=1574457593322, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=o.a.i.i.util.nio.GridDirectParser@2f6039d0, directMode=true], > GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]] > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > ~[?:1.8.0_201] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > ~[?:1.8.0_201] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > ~[?:1.8.0_201] > at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_201] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > ~[?:1.8.0_201] > at > > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1282) > ~[ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2386) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2153) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1794) > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > [ignite-core-2.7.6.jar:2.7.6] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] > [2019-11-22T21:19:54,624][WARN > ][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Closing NIO session > because of unhandled exception [cls=class > o.a.i.i.util.nio.GridNioException, > msg=Connection reset by peer] > > > - there was no network issues in any way > - here is the node logs; > ignite.zip > <http://apache-ignite-users.70518.x6.nabble.com/file/t2515/ignite.zip> > - here is the client node; > proto-20191122.log > < > http://apache-ignite-users.70518.x6.nabble.com/file/t2515/proto-20191122.log> > > > any thougs? > > Regards > > > > > ----- > İbrahim Halil Altun > Senior Software Engineer @ Segmentify > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
