Hi Ilya,

It looks like that error corresponds to restarts of the particular pods
we're running.  We're currently running in Kubernetes as a stateful set.

I think it has to do with the node coming back up with the same address and
hostname but a different identifier.  I see this in the logs:
Caused by:
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException:
Remote node ID is not as expected
[expected=704fa7c2-bb6a-44bb-89c6-06722a3abac8,
rcvd=922d993c-6b08-4bee-92f2-130e108e3657]

After manually setting consistentId in the configuration, it seems that I
can bounce the pods at will without hitting this issue. I'll follow up if
we see it again.

Thanks,
Bryan

On Tue, Feb 27, 2018 at 11:14 AM, Ilya Kasnacheev <[email protected]
> wrote:

> Hello Bryan!
>
> 2nd attempt to send this mail.
>
>
> Can you search in the log prior to the first problematic "Accepted
> incoming communication connection"? I assume there was a communication
> connection already back when the node was started, and you should look why
> it was closed in the first place. That might provide clues.
>
> Also, logs from remote node (one that makes those connections) at the same
> time might provide clues.
>
> Don't hesitate to provide full node logs.
>
> Regards,
>
>
> --
> Ilya Kasnacheev
>
> 2018-02-27 18:32 GMT+03:00 Ilya Kasnacheev <[email protected]>:
>
>> Hello Bryan!
>>
>> Can you search in the log prior to the first problematic "Accepted
>> incoming communication connection"? I assume there was a communication
>> connection already back when the node was started, and you should look why
>> it was closed in the first place. That might provide clues.
>>
>> Also, logs from remote node (one that makes those connections) at the
>> same time might provide clues.
>>
>> Don't hesitate to provide full node logs.
>>
>> Regards,
>>
>> --
>> Ilya Kasnacheev
>>
>> 2018-02-27 18:03 GMT+03:00 Bryan Rosander <[email protected]>:
>>
>>> Also, this is ignite 2.3.0, please let me know if there's any more
>>> information I can provide.
>>>
>>> On Tue, Feb 27, 2018 at 9:59 AM, Bryan Rosander <[email protected]
>>> > wrote:
>>>
>>>> We're using ignite in a 3 node grid with SSL just hit an issue where
>>>> after a period of time (hours after starting), 2 of the 3 nodes seem to
>>>> have lost connectivity and we see the following stack trace over and over.
>>>>
>>>> The cluster starts up fine so I doubt it's an issue with the
>>>> certificates or keystores.  Also bouncing the ignite instances seems to
>>>> have "fixed" it.  Any ideas as to what could have happened?
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>> 2018-02-27 14:52:36,071 INFO  [grid-nio-worker-tcp-comm-2-#27]
>>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication
>>>> connection [locAddr=/100.96.3.72:47100, rmtAddr=/100.96.6.183:45484]
>>>> 2018-02-27 14:52:37,072 ERROR [grid-nio-worker-tcp-comm-2-#27]
>>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Failed to process selector key
>>>> [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
>>>> [super=AbstractNioClientWorker [idx=2, bytesRcvd=17479234, bytesSent=0,
>>>> bytesRcvd0=2536, bytesSent0=0, select=true, super=GridWorker
>>>> [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null,
>>>> finished=false, hashCode=1854311052, interrupted=false,
>>>> runner=grid-nio-worker-tcp-comm-2-#27]]],
>>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768],
>>>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
>>>> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/
>>>> 100.96.3.72:47100, rmtAddr=/100.96.6.183:45484,
>>>> createTime=1519743156030, closeTime=0, bytesSent=2448, bytesRcvd=2536,
>>>> bytesSent0=2448, bytesRcvd0=2536, sndSchedTime=1519743156071,
>>>> lastSndTime=1519743156071, lastRcvTime=1519743156071, readsPaused=false,
>>>> filterChain=FilterChain[filters=[GridNioCodecFilter
>>>> [parser=o.a.i.i.util.nio.GridDirectParser@497350a6, directMode=true],
>>>> GridConnectionBytesVerifyFilter, SSL filter], accepted=true]]]
>>>> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error)
>>>> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl
>>>> [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2,
>>>> bytesRcvd=17479234, bytesSent=0, bytesRcvd0=2536, bytesSent0=0,
>>>> select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2,
>>>> igniteInstanceName=null, finished=false, hashCode=1854311052,
>>>> interrupted=false, runner=grid-nio-worker-tcp-comm-2-#27]]],
>>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768],
>>>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
>>>> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/
>>>> 100.96.3.72:47100, rmtAddr=/100.96.6.183:45484,
>>>> createTime=1519743156030, closeTime=0, bytesSent=2448, bytesRcvd=2536,
>>>> bytesSent0=2448, bytesRcvd0=2536, sndSchedTime=1519743156071,
>>>> lastSndTime=1519743156071, lastRcvTime=1519743156071, readsPaused=false,
>>>> filterChain=FilterChain[filters=[GridNioCodecFilter
>>>> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@497350a6,
>>>> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
>>>> accepted=true]]]
>>>>         at org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.en
>>>> crypt(GridNioSslHandler.java:379)
>>>>         at org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.enc
>>>> rypt(GridNioSslFilter.java:270)
>>>>         at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioC
>>>> lientWorker.processWriteSsl(GridNioServer.java:1418)
>>>>         at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioC
>>>> lientWorker.processWrite(GridNioServer.java:1287)
>>>>         at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNi
>>>> oClientWorker.processSelectedKeysOptimized(GridNioServer.java:2275)
>>>>         at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNi
>>>> oClientWorker.bodyInternal(GridNioServer.java:2048)
>>>>         at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNi
>>>> oClientWorker.body(GridNioServer.java:1717)
>>>>         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWo
>>>> rker.java:110)
>>>>         at java.lang.Thread.run(Thread.java:748)
>>>> 2018-02-27 14:52:37,072 WARN  [grid-nio-worker-tcp-comm-2-#27]
>>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Closing NIO session because of
>>>> unhandled exception [cls=class o.a.i.i.util.nio.GridNioException,
>>>> msg=Failed to encrypt data (SSL engine error) [status=CLOSED,
>>>> handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl
>>>> [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2,
>>>> bytesRcvd=17479234, bytesSent=0, bytesRcvd0=2536, bytesSent0=0,
>>>> select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2,
>>>> igniteInstanceName=null, finished=false, hashCode=1854311052,
>>>> interrupted=false, runner=grid-nio-worker-tcp-comm-2-#27]]],
>>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768],
>>>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
>>>> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/
>>>> 100.96.3.72:47100, rmtAddr=/100.96.6.183:45484,
>>>> createTime=1519743156030, closeTime=0, bytesSent=2448, bytesRcvd=2536,
>>>> bytesSent0=2448, bytesRcvd0=2536, sndSchedTime=1519743156071,
>>>> lastSndTime=1519743156071, lastRcvTime=1519743156071, readsPaused=false,
>>>> filterChain=FilterChain[filters=[GridNioCodecFilter
>>>> [parser=o.a.i.i.util.nio.GridDirectParser@497350a6, directMode=true],
>>>> GridConnectionBytesVerifyFilter, SSL filter], accepted=true]]]]
>>>> 2018-02-27 14:52:37,321 INFO  [grid-nio-worker-tcp-comm-3-#28]
>>>> o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication
>>>> connection [locAddr=/100.96.3.72:47100, rmtAddr=/100.96.6.183:45490]
>>>>
>>>
>>>
>>
>

Reply via email to