Hi Ilya, It looks like that error corresponds to restarts of the particular pods we're running. We're currently running in Kubernetes as a stateful set.
I think it has to do with the node coming back up with the same address and hostname but a different identifier. I see this in the logs: Caused by: org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: Remote node ID is not as expected [expected=704fa7c2-bb6a-44bb-89c6-06722a3abac8, rcvd=922d993c-6b08-4bee-92f2-130e108e3657] After manually setting consistentId in the configuration, it seems that I can bounce the pods at will without hitting this issue. I'll follow up if we see it again. Thanks, Bryan On Tue, Feb 27, 2018 at 11:14 AM, Ilya Kasnacheev <[email protected] > wrote: > Hello Bryan! > > 2nd attempt to send this mail. > > > Can you search in the log prior to the first problematic "Accepted > incoming communication connection"? I assume there was a communication > connection already back when the node was started, and you should look why > it was closed in the first place. That might provide clues. > > Also, logs from remote node (one that makes those connections) at the same > time might provide clues. > > Don't hesitate to provide full node logs. > > Regards, > > > -- > Ilya Kasnacheev > > 2018-02-27 18:32 GMT+03:00 Ilya Kasnacheev <[email protected]>: > >> Hello Bryan! >> >> Can you search in the log prior to the first problematic "Accepted >> incoming communication connection"? I assume there was a communication >> connection already back when the node was started, and you should look why >> it was closed in the first place. That might provide clues. >> >> Also, logs from remote node (one that makes those connections) at the >> same time might provide clues. >> >> Don't hesitate to provide full node logs. >> >> Regards, >> >> -- >> Ilya Kasnacheev >> >> 2018-02-27 18:03 GMT+03:00 Bryan Rosander <[email protected]>: >> >>> Also, this is ignite 2.3.0, please let me know if there's any more >>> information I can provide. >>> >>> On Tue, Feb 27, 2018 at 9:59 AM, Bryan Rosander <[email protected] >>> > wrote: >>> >>>> We're using ignite in a 3 node grid with SSL just hit an issue where >>>> after a period of time (hours after starting), 2 of the 3 nodes seem to >>>> have lost connectivity and we see the following stack trace over and over. >>>> >>>> The cluster starts up fine so I doubt it's an issue with the >>>> certificates or keystores. Also bouncing the ignite instances seems to >>>> have "fixed" it. Any ideas as to what could have happened? >>>> >>>> Thanks, >>>> Bryan >>>> >>>> 2018-02-27 14:52:36,071 INFO [grid-nio-worker-tcp-comm-2-#27] >>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication >>>> connection [locAddr=/100.96.3.72:47100, rmtAddr=/100.96.6.183:45484] >>>> 2018-02-27 14:52:37,072 ERROR [grid-nio-worker-tcp-comm-2-#27] >>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Failed to process selector key >>>> [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker >>>> [super=AbstractNioClientWorker [idx=2, bytesRcvd=17479234, bytesSent=0, >>>> bytesRcvd0=2536, bytesSent0=0, select=true, super=GridWorker >>>> [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null, >>>> finished=false, hashCode=1854311052, interrupted=false, >>>> runner=grid-nio-worker-tcp-comm-2-#27]]], >>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], >>>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], >>>> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/ >>>> 100.96.3.72:47100, rmtAddr=/100.96.6.183:45484, >>>> createTime=1519743156030, closeTime=0, bytesSent=2448, bytesRcvd=2536, >>>> bytesSent0=2448, bytesRcvd0=2536, sndSchedTime=1519743156071, >>>> lastSndTime=1519743156071, lastRcvTime=1519743156071, readsPaused=false, >>>> filterChain=FilterChain[filters=[GridNioCodecFilter >>>> [parser=o.a.i.i.util.nio.GridDirectParser@497350a6, directMode=true], >>>> GridConnectionBytesVerifyFilter, SSL filter], accepted=true]]] >>>> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) >>>> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl >>>> [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, >>>> bytesRcvd=17479234, bytesSent=0, bytesRcvd0=2536, bytesSent0=0, >>>> select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2, >>>> igniteInstanceName=null, finished=false, hashCode=1854311052, >>>> interrupted=false, runner=grid-nio-worker-tcp-comm-2-#27]]], >>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], >>>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], >>>> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/ >>>> 100.96.3.72:47100, rmtAddr=/100.96.6.183:45484, >>>> createTime=1519743156030, closeTime=0, bytesSent=2448, bytesRcvd=2536, >>>> bytesSent0=2448, bytesRcvd0=2536, sndSchedTime=1519743156071, >>>> lastSndTime=1519743156071, lastRcvTime=1519743156071, readsPaused=false, >>>> filterChain=FilterChain[filters=[GridNioCodecFilter >>>> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@497350a6, >>>> directMode=true], GridConnectionBytesVerifyFilter, SSL filter], >>>> accepted=true]]] >>>> at org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.en >>>> crypt(GridNioSslHandler.java:379) >>>> at org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.enc >>>> rypt(GridNioSslFilter.java:270) >>>> at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioC >>>> lientWorker.processWriteSsl(GridNioServer.java:1418) >>>> at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioC >>>> lientWorker.processWrite(GridNioServer.java:1287) >>>> at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNi >>>> oClientWorker.processSelectedKeysOptimized(GridNioServer.java:2275) >>>> at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNi >>>> oClientWorker.bodyInternal(GridNioServer.java:2048) >>>> at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNi >>>> oClientWorker.body(GridNioServer.java:1717) >>>> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWo >>>> rker.java:110) >>>> at java.lang.Thread.run(Thread.java:748) >>>> 2018-02-27 14:52:37,072 WARN [grid-nio-worker-tcp-comm-2-#27] >>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Closing NIO session because of >>>> unhandled exception [cls=class o.a.i.i.util.nio.GridNioException, >>>> msg=Failed to encrypt data (SSL engine error) [status=CLOSED, >>>> handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl >>>> [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, >>>> bytesRcvd=17479234, bytesSent=0, bytesRcvd0=2536, bytesSent0=0, >>>> select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2, >>>> igniteInstanceName=null, finished=false, hashCode=1854311052, >>>> interrupted=false, runner=grid-nio-worker-tcp-comm-2-#27]]], >>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], >>>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], >>>> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/ >>>> 100.96.3.72:47100, rmtAddr=/100.96.6.183:45484, >>>> createTime=1519743156030, closeTime=0, bytesSent=2448, bytesRcvd=2536, >>>> bytesSent0=2448, bytesRcvd0=2536, sndSchedTime=1519743156071, >>>> lastSndTime=1519743156071, lastRcvTime=1519743156071, readsPaused=false, >>>> filterChain=FilterChain[filters=[GridNioCodecFilter >>>> [parser=o.a.i.i.util.nio.GridDirectParser@497350a6, directMode=true], >>>> GridConnectionBytesVerifyFilter, SSL filter], accepted=true]]]] >>>> 2018-02-27 14:52:37,321 INFO [grid-nio-worker-tcp-comm-3-#28] >>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication >>>> connection [locAddr=/100.96.3.72:47100, rmtAddr=/100.96.6.183:45490] >>>> >>> >>> >> >
