Yes TLS is enabled. But that's a must for our prod env. Will try renew certificates and see how it goes. Thanks !
---Original--- From: "Stephen Darlington"<sdarling...@apache.org> Date: Sat, Oct 26, 2024 00:12 AM To: "user"<user@ignite.apache.org>; Subject: Re: rejecting communication connection & Failed to processselector key You've not shared much information about your cluster (or Kubernetes) configuration, but from what I can gather you have TLS enabled. I would suggest you disable TLS and see if you get the same error. The most common problem here is your certificates not exactly matching your topology. On Thu, 10 Oct 2024 at 00:50, MJ <6733...@qq.com> wrote: Added log details of another “Broken pipe” excpetion. Per the log the two nodes appear connecting to each other for multiple times though I did not set “usePairedConnections” . Any idea how below activities can be explained , or some special configs are needed for container environment ? Thanks. Below both are the server nodes and running in OpenShift platform, and using kubernetes discovery spi . Node #1 - 10.254.28.57 Node #2 - 10.254.14.43 --- log in Node #1 (10.254.28.57) 24-10-08 18:34:27.574 [INFO ] data-streamer-stripe-5-#14%app_primary% o.a.i.s.c.t.TcpCommunicationSpi:117 - TCP client created [client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=21728499480, bytesSent=0, bytesRcvd0=86653, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=9223372036854775807, hashCode=1168063678, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=15680575, resendCnt=0, rcvCnt=14114194, sentCnt=15680575, reserved=true, lastAck=14114194, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=15680575, resendCnt=0, rcvCnt=14114194, sentCnt=15680575, reserved=true, lastAck=14114194, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=org.apache.ignite.internal.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.254.28.57:34768, rmtAddr=/10.254.14.43:47100, createTime=1728426867569, closeTime=0, bytesSent=0, bytesRcvd=0, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1728426867569, lastSndTime=1728426867569, lastRcvTime=1728426867569, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@10f22b66, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=false]], super=GridAbstractCommunicationClient [lastUsed=1728426867569, closed=false, connIdx=0]], duration=171ms] 24-10-08 18:34:27.853 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41546] 24-10-08 18:34:27.854 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f] 24-10-08 18:34:28.064 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41558] 24-10-08 18:34:28.065 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f] 24-10-08 18:34:28.275 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41566] 24-10-08 18:34:28.276 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f] 24-10-08 18:34:28.717 [ERROR] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:137 - Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=21728523888, bytesSent=0, bytesRcvd0=111061, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1728426867710, hashCode=1168063678, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=15681792, resendCnt=0, rcvCnt=14114195, sentCnt=15681955, reserved=true, lastAck=14114195, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=15681792, resendCnt=0, rcvCnt=14114195, sentCnt=15681955, reserved=true, lastAck=14114195, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.254.28.57:34768, rmtAddr=/10.254.14.43:47100, createTime=1728426867569, closeTime=0, bytesSent=216640, bytesRcvd=3326, bytesSent0=216640, bytesRcvd0=3326, sndSchedTime=1728426867569, lastSndTime=1728426867710, lastRcvTime=1728426867710, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@10f22b66, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=true]]] java.io.IOException: Broken pipe at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?] at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62) ~[?:?] at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:132) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:76) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:53) ~[?:?] at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:532) ~[?:?] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1524) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1405) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2530) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2282) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1911) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [ignite-core-2.16.0.jar:2.16.0] at java.base/java.lang.Thread.run(Thread.java:842) [?:?] 24-10-08 18:34:28.718 [WARN ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:127 - Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Broken pipe] 24-10-08 18:34:28.733 [INFO ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41578] 24-10-08 18:34:28.734 [INFO ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection from remote node while connecting to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, locNodeOrder=3, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f, rmtNodeOrder=4] --- log in Node #2 (10.254.14.43) 24-10-08 18:34:27.572 [INFO ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.14.43:47100, rmtAddr=/10.254.28.57:34768] 24-10-08 18:34:27.642 [INFO ] data-streamer-stripe-3-#12%app_primary% o.a.i.s.c.t.TcpCommunicationSpi:117 - TCP client created [client=null, node addrs=[/10.254.28.57:47100, /127.0.0.1:47100], duration=211ms] 24-10-08 18:34:28.728 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.14.43:47100, rmtAddr=/10.254.28.57:34780] 24-10-08 18:34:28.734 [INFO ] data-streamer-stripe-1-#10%app_primary% o.a.i.s.c.t.TcpCommunicationSpi:117 - TCP client created [client=null, node addrs=[/10.254.28.57:47100, /127.0.0.1:47100], duration=257ms] Thanks, -MJ ---Original--- From: "MJ"<6733...@qq.com> Date: Fri, Sep 20, 2024 14:24 PM To: "user"<user@ignite.apache.org>; Subject: Re: rejecting communication connection & Failed to process selector key Yes - kubernetes discovery spi. No client nodes are out of the cluster. and below setting for communication spi. ---- config of communicationSpi <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <!-- <property name="localPort" value="48100" /> --> <property name="messageQueueLimit" value="1024"/> <property name="socketWriteTimeout" value="25000"/> <property name="idleConnectionTimeout" value="60000"/> </bean> </property> ---Original--- From: "Humphrey Lopez"<hmmlo...@gmail.com> Date: Fri, Sep 20, 2024 13:27 PM To: "user"<user@ignite.apache.org>; Subject: Re: rejecting communication connection & Failed to process selector key Are you using the kubernetes discovery SPI? Humphrey On 16 Sep 2024, at 02:58, MJ <6733...@qq.com> wrote: Hi Igniters, I am experiencing the “Failed to process selector key” error once every one or two days. Every time it appears received and rejected multiple communication connections and then threw the exception. Below logging is about “Broken pipe” original exception but not only “Broken pipe”, occasionally the “Failed to process selector key” wraps “Connection Reset”, “javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) [status=CLOSED, handshakeStatus=NOT_HANDSHAKING”. Is there any solution to fix it ? or its configuration can be improved ? Ignite 2.16.0 / 4 data nodes, running in openshift 4 ---- config of communicationSpi <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <!-- <property name="localPort" value="48100" /> --> <property name="messageQueueLimit" value="1024"/> <property name="socketWriteTimeout" value="25000"/> <property name="idleConnectionTimeout" value="60000"/> </bean> </property> 24-09-15 17:18:35.146 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35160] 24-09-15 17:18:35.147 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40, rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713] 24-09-15 17:18:35.357 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35162] 24-09-15 17:18:35.358 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40, rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713] 24-09-15 17:18:35.568 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35164] 24-09-15 17:18:35.569 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40, rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713] 24-09-15 17:18:35.975 [ERROR] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:137 - Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=29406013584, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1726435114873, hashCode=1144648384, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=20129536, resendCnt=0, rcvCnt=19533551, sentCnt=20129879, reserved=true, lastAck=19533551, nodeLeft=false, node=TcpDiscoveryNode [id=7c28b6bc-8991-47a2-b69c-6adba0482713, consistentId=10.254.13.83,127.0.0.1:47500, addrs=ArrayList [10.254.13.83, 127.0.0.1], sockAddrs=HashSet [/10.254.13.83:47500, /127.0.0.1:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1724822271382, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=205, queueLimit=131072, reserveCnt=260, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=20129536, resendCnt=0, rcvCnt=19533551, sentCnt=20129879, reserved=true, lastAck=19533551, nodeLeft=false, node=TcpDiscoveryNode [id=7c28b6bc-8991-47a2-b69c-6adba0482713, consistentId=10.254.13.83,127.0.0.1:47500, addrs=ArrayList [10.254.13.83, 127.0.0.1], sockAddrs=HashSet [/10.254.13.83:47500, /127.0.0.1:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1724822271382, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=205, queueLimit=131072, reserveCnt=260, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.254.32.162:52542, rmtAddr=/10.254.13.83:47100, createTime=1726435114863, closeTime=0, bytesSent=164200, bytesRcvd=468, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1726435114863, lastSndTime=1726435114972, lastRcvTime=1726435114972, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@5196c6f7, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=true]]] java.io.IOException: Broken pipe at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?] at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62) ~[?:?] at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:132) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:76) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:53) ~[?:?] at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:532) ~[?:?] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1524) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1405) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2530) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2282) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1911) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [ignite-core-2.16.0.jar:2.16.0] at java.base/java.lang.Thread.run(Thread.java:842) [?:?] Thanks, -MJ <logs.txt>