Link<https://stackoverflow.com/questions/69587418/ignite-cluster-becomes-unresponsive-when-relaunching-client-nodes>
to stack overflow post:.
We are intermittently seeing the following error on our k8tes setup.
The issue happens after we relaunch our tomcat pod which launches new Ignite
client nodes.
I understand the first stack trace shows that Ignite has detected that the tcp
communications spi has become unresponsive but I do not see how this has
anything to do with the second stack trace. This seems like two totally
unrelated errors but second one says the thread dump is at the same timestamp
as the first. `Thread dump at 2021/10/12 15:57:17`
The issue can resolved by bringing down all the Ignite pods and relaunching
them but A better understanding of this issue and a way to not need to restart
Ignite would be apricated.
12-Oct-2021 15:57:17.139 WARNING
[grid-timeout-worker-#134%igniteClientInstance%]
org.apache.ignite.logger.java.JavaLogger.warning Possible failure suppressed
accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler
[tryStop=false, timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=tcp-comm-worker, igniteInstanceName=igniteClientInstance, finished=false,
heartbeatTs=1634054222218]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker,
igniteInstanceName=igniteClientInstance, finished=false,
heartbeatTs=1634054222218]
at java.base/sun.nio.ch.Net.poll(Native Method)
at
java.base/sun.nio.ch.SocketChannelImpl.pollConnected(SocketChannelImpl.java:991)
at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:119)
at
org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createNioSession(GridNioServerWrapper.java:465)
at
org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:691)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1255)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$$Lambda$389/0x0000000012e5ffc0.apply(Unknown
Source)
at
org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:689)
at
org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.createCommunicationClient(ConnectionClientPool.java:453)
at
org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:228)
at
org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.processDisconnect(CommunicationWorker.java:374)
at
org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.body(CommunicationWorker.java:174)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$6.body(TcpCommunicationSpi.java:923)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
12-Oct-2021 15:57:17.141 WARNING
[grid-timeout-worker-#134%igniteClientInstance%]
org.apache.ignite.logger.java.JavaLogger.warning No deadlocked threads detected.
12-Oct-2021 15:57:17.170 WARNING
[grid-timeout-worker-#134%igniteClientInstance%]
org.apache.ignite.logger.java.JavaLogger.warning Thread dump at 2021/10/12
15:57:17 GMT
Thread [name="main", id=1, state=RUNNABLE, blockCnt=19, waitCnt=416]
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at
java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
at
java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at
java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
- locked java.io.BufferedInputStream@263909ea
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:256)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1163)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
- locked org.postgresql.core.v3.QueryExecutorImpl@1b338a37
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:437)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:257)
at
com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:116)
at
org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:70)
at org.hibernate.loader.Loader.getResultSet(Loader.java:2123)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1911)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1887)
at org.hibernate.loader.Loader.doQuery(Loader.java:932)
at
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:349)
at org.hibernate.loader.Loader.doList(Loader.java:2615)
at org.hibernate.loader.Loader.doList(Loader.java:2598)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2430)
at org.hibernate.loader.Loader.list(Loader.java:2425)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:502)
at
org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:370)
at
org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:216)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1481)
at
org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1441)
at
org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1410)
at org.hibernate.Query.getResultList(Query.java:427)
at
com.foo.dao.hibernate.report.FooBarImpl.retrieveFoo(FooBarImpl.java:61)
at jdk.internal.reflect.GeneratedMethodAccessor513.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)