Added log details of another “Broken pipe” excpetion. Per the log the two nodes appear connecting to each other for multiple times though I did not set “usePairedConnections” . Any idea how below activities can be explained , or some special configs are needed for container environment ? Thanks.
Below both are the server nodes and running in OpenShift platform, and using kubernetes discovery spi . Node #1 - 10.254.28.57 Node #2 - 10.254.14.43 --- log in Node #1 (10.254.28.57) 24-10-08 18:34:27.574 [INFO ] data-streamer-stripe-5-#14%app_primary% o.a.i.s.c.t.TcpCommunicationSpi:117 - TCP client created [client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=21728499480, bytesSent=0, bytesRcvd0=86653, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=9223372036854775807, hashCode=1168063678, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=15680575, resendCnt=0, rcvCnt=14114194, sentCnt=15680575, reserved=true, lastAck=14114194, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=15680575, resendCnt=0, rcvCnt=14114194, sentCnt=15680575, reserved=true, lastAck=14114194, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=org.apache.ignite.internal.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.254.28.57:34768, rmtAddr=/10.254.14.43:47100, createTime=1728426867569, closeTime=0, bytesSent=0, bytesRcvd=0, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1728426867569, lastSndTime=1728426867569, lastRcvTime=1728426867569, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@10f22b66, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=false]], super=GridAbstractCommunicationClient [lastUsed=1728426867569, closed=false, connIdx=0]], duration=171ms] 24-10-08 18:34:27.853 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41546] 24-10-08 18:34:27.854 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f] 24-10-08 18:34:28.064 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41558] 24-10-08 18:34:28.065 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f] 24-10-08 18:34:28.275 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41566] 24-10-08 18:34:28.276 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f] 24-10-08 18:34:28.717 [ERROR] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:137 - Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=21728523888, bytesSent=0, bytesRcvd0=111061, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1728426867710, hashCode=1168063678, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=15681792, resendCnt=0, rcvCnt=14114195, sentCnt=15681955, reserved=true, lastAck=14114195, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=15681792, resendCnt=0, rcvCnt=14114195, sentCnt=15681955, reserved=true, lastAck=14114195, nodeLeft=false, node=TcpDiscoveryNode [id=3884769a-aa73-47de-91e7-65a35e63c92f, consistentId=10.254.14.43,127.0.0.1:47500, addrs=ArrayList [10.254.14.43, 127.0.0.1], sockAddrs=HashSet [/10.254.14.43:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1727333363909, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=102, queueLimit=131072, reserveCnt=138, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.254.28.57:34768, rmtAddr=/10.254.14.43:47100, createTime=1728426867569, closeTime=0, bytesSent=216640, bytesRcvd=3326, bytesSent0=216640, bytesRcvd0=3326, sndSchedTime=1728426867569, lastSndTime=1728426867710, lastRcvTime=1728426867710, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@10f22b66, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=true]]] java.io.IOException: Broken pipe at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?] at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62) ~[?:?] at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:132) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:76) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:53) ~[?:?] at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:532) ~[?:?] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1524) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1405) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2530) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2282) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1911) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [ignite-core-2.16.0.jar:2.16.0] at java.base/java.lang.Thread.run(Thread.java:842) [?:?] 24-10-08 18:34:28.718 [WARN ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:127 - Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Broken pipe] 24-10-08 18:34:28.733 [INFO ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.28.57:47100, rmtAddr=/10.254.14.43:41578] 24-10-08 18:34:28.734 [INFO ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection from remote node while connecting to this node, rejecting [locNode=7b5f6ac3-3bfa-49b3-9580-b185912fb17e, locNodeOrder=3, rmtNode=3884769a-aa73-47de-91e7-65a35e63c92f, rmtNodeOrder=4] --- log in Node #2 (10.254.14.43) 24-10-08 18:34:27.572 [INFO ] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.14.43:47100, rmtAddr=/10.254.28.57:34768] 24-10-08 18:34:27.642 [INFO ] data-streamer-stripe-3-#12%app_primary% o.a.i.s.c.t.TcpCommunicationSpi:117 - TCP client created [client=null, node addrs=[/10.254.28.57:47100, /127.0.0.1:47100], duration=211ms] 24-10-08 18:34:28.728 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.14.43:47100, rmtAddr=/10.254.28.57:34780] 24-10-08 18:34:28.734 [INFO ] data-streamer-stripe-1-#10%app_primary% o.a.i.s.c.t.TcpCommunicationSpi:117 - TCP client created [client=null, node addrs=[/10.254.28.57:47100, /127.0.0.1:47100], duration=257ms] Thanks, -MJ ---Original--- From: "MJ"<6733...@qq.com> Date: Fri, Sep 20, 2024 14:24 PM To: "user"<user@ignite.apache.org>; Subject: Re: rejecting communication connection & Failed to process selector key Yes - kubernetes discovery spi. No client nodes are out of the cluster. and below setting for communication spi. ---- config of communicationSpi <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <!-- <property name="localPort" value="48100" /> --> <property name="messageQueueLimit" value="1024"/> <property name="socketWriteTimeout" value="25000"/> <property name="idleConnectionTimeout" value="60000"/> </bean> </property> ---Original--- From: "Humphrey Lopez"<hmmlo...@gmail.com> Date: Fri, Sep 20, 2024 13:27 PM To: "user"<user@ignite.apache.org>; Subject: Re: rejecting communication connection & Failed to process selector key Are you using the kubernetes discovery SPI? Humphrey On 16 Sep 2024, at 02:58, MJ <6733...@qq.com> wrote: Hi Igniters, I am experiencing the “Failed to process selector key” error once every one or two days. Every time it appears received and rejected multiple communication connections and then threw the exception. Below logging is about “Broken pipe” original exception but not only “Broken pipe”, occasionally the “Failed to process selector key” wraps “Connection Reset”, “javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) [status=CLOSED, handshakeStatus=NOT_HANDSHAKING”. Is there any solution to fix it ? or its configuration can be improved ? Ignite 2.16.0 / 4 data nodes, running in openshift 4 ---- config of communicationSpi <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <!-- <property name="localPort" value="48100" /> --> <property name="messageQueueLimit" value="1024"/> <property name="socketWriteTimeout" value="25000"/> <property name="idleConnectionTimeout" value="60000"/> </bean> </property> 24-09-15 17:18:35.146 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35160] 24-09-15 17:18:35.147 [INFO ] grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40, rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713] 24-09-15 17:18:35.357 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35162] 24-09-15 17:18:35.358 [INFO ] grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40, rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713] 24-09-15 17:18:35.568 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35164] 24-09-15 17:18:35.569 [INFO ] grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when already connected to this node, rejecting [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40, rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713] 24-09-15 17:18:35.975 [ERROR] grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi% o.a.i.s.c.t.TcpCommunicationSpi:137 - Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=29406013584, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1726435114873, hashCode=1144648384, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=20129536, resendCnt=0, rcvCnt=19533551, sentCnt=20129879, reserved=true, lastAck=19533551, nodeLeft=false, node=TcpDiscoveryNode [id=7c28b6bc-8991-47a2-b69c-6adba0482713, consistentId=10.254.13.83,127.0.0.1:47500, addrs=ArrayList [10.254.13.83, 127.0.0.1], sockAddrs=HashSet [/10.254.13.83:47500, /127.0.0.1:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1724822271382, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=205, queueLimit=131072, reserveCnt=260, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=20129536, resendCnt=0, rcvCnt=19533551, sentCnt=20129879, reserved=true, lastAck=19533551, nodeLeft=false, node=TcpDiscoveryNode [id=7c28b6bc-8991-47a2-b69c-6adba0482713, consistentId=10.254.13.83,127.0.0.1:47500, addrs=ArrayList [10.254.13.83, 127.0.0.1], sockAddrs=HashSet [/10.254.13.83:47500, /127.0.0.1:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1724822271382, loc=false, ver=2.16.0#20231215-sha1:7bde6a42, isClient=false], connected=false, connectCnt=205, queueLimit=131072, reserveCnt=260, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.254.32.162:52542, rmtAddr=/10.254.13.83:47100, createTime=1726435114863, closeTime=0, bytesSent=164200, bytesRcvd=468, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1726435114863, lastSndTime=1726435114972, lastRcvTime=1726435114972, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@5196c6f7, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=true]]] java.io.IOException: Broken pipe at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?] at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62) ~[?:?] at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:132) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:76) ~[?:?] at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:53) ~[?:?] at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:532) ~[?:?] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1524) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1405) ~[ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2530) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2282) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1911) [ignite-core-2.16.0.jar:2.16.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [ignite-core-2.16.0.jar:2.16.0] at java.base/java.lang.Thread.run(Thread.java:842) [?:?] Thanks, -MJ <logs.txt>