Hello! Looks like network problems, long GC on server node or some kind of deadlock on server node which prevents it from responding.
Regards, -- Ilya Kasnacheev ср, 24 февр. 2021 г. в 13:09, oguzhan <[email protected]>: > Hello, > > We have 1 client node and 1 server node and we are using ignite version > 2.9.1. > > Our application is scheduled to do the same jobs every day. Then our > application did not get any errors for 2 weeks, but 2 weeks later, we are > getting this error as you can see below (We get such an error about every 2 > weeks): > > I hope you support to solve my problem. Thanks and best regards... > > > 2021-02-14 02:07:34 WARN tcp-client-disco-reconnector-#7-#77756 > TcpDiscoverySpi:576 - Failed to connect to any address from IP finder (will > retry to join topology every 2000 ms; change 'reconnectDelay' to configure > the frequency of retries): [/127.0.0.1:47500, /127.0.0.1:47501, > /127.0.0.1:47502, /127.0.0.1:47503, /127.0.0.1:47504, /127.0.0.1:47505, > /127.0.0.1:47506, /127.0.0.1:47507, /127.0.0.1:47508, /127.0.0.1:47509] > 2021-02-14 02:07:37 INFO grid-timeout-worker-#206 IgniteKernal:566 - > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=2fefd66f, uptime=4 days, 13:33:34.341] > ^-- Cluster [hosts=1, CPUs=16, servers=1, clients=1, topVer=2, > minorTopVer=18985] > ^-- Network [addrs=[10.86.26.180, 127.0.0.1], discoPort=0, > commPort=47101] > ^-- CPU [CPUs=16, curLoad=1.07%, avgLoad=0.05%, GC=0.1%] > ^-- Heap [used=865MB, free=92.96%, comm=12274MB] > ^-- Off-heap memory [used=0MB, free=100%, allocated=0MB] > ^-- Page memory [pages=0] > ^-- sysMemPlc region [type=internal, persistence=false, > lazyAlloc=false, > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, > allocRam=0MB] > ^-- TxLog region [type=internal, persistence=false, lazyAlloc=false, > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, > allocRam=0MB] > ^-- Default_Region region [type=default, persistence=false, > lazyAlloc=true, > ... initCfg=256MB, maxCfg=32768MB, usedRam=0MB, freeRam=100%, > allocRam=0MB] > ^-- Outbound messages queue [size=0] > ^-- Public thread pool [active=0, idle=0, qSize=0] > ^-- System thread pool [active=0, idle=81, qSize=0] > 2021-02-14 02:07:38 ERROR tcp-client-disco-sock-writer-#2-#230 > TcpDiscoverySpi:586 - Failed to send message: null > java.io.IOException: Failed to get acknowledge for message: > TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage > [sndNodeId=null, id=1d467368771-2fefd66f-0954-45dd-aa32-a33e58567950, > verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, > isClient=true]] > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1471) > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > 2021-02-14 02:07:44 WARN tcp-comm-worker-#1-#216 TcpCommunicationSpi:576 - > Handshake timed out (will stop attempts to perform the handshake) > [node=6953d599-d606-4781-a6ba-43de7aff59e4, > connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000, > totalTimeout=10000, startNanos=1671033974906026, currTimeout=600000], > err=Operation timed out [timeoutStrategy= ExponentialBackoffTimeoutStrategy > [maxTimeout=600000, totalTimeout=10000, startNanos=1671033974906026, > currTimeout=600000]], addr=/127.0.0.1:47100, > failureDetectionTimeoutEnabled=true, timeout=0] > 2021-02-14 02:07:54 WARN tcp-comm-worker-#1-#216 TcpCommunicationSpi:576 - > Handshake timed out (will stop attempts to perform the handshake) > [node=6953d599-d606-4781-a6ba-43de7aff59e4, > connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000, > totalTimeout=10000, startNanos=1671044002786218, currTimeout=600000], > err=Operation timed out [timeoutStrategy= ExponentialBackoffTimeoutStrategy > [maxTimeout=600000, totalTimeout=10000, startNanos=1671044002786218, > currTimeout=600000]], addr=dwccatp01/10.86.26.180:47100, > failureDetectionTimeoutEnabled=true, timeout=0] > 2021-02-14 02:08:06 ERROR grid-timeout-worker-#206 G:581 - Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [workerName=tcp-comm-worker, > threadName=tcp-comm-worker-#1-#216, blockedFor=11s] > 2021-02-14 02:08:06 WARN grid-timeout-worker-#206 root:576 - Possible > failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) > at > > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) > at > > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) > at > > org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > [02:08:06] Possible failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > 2021-02-14 02:08:07 WARN grid-timeout-worker-#206 > CacheDiagnosticManager:571 - Page locks dump: > > > 2021-02-14 02:08:16 ERROR grid-timeout-worker-#206 G:581 - Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [workerName=tcp-comm-worker, > threadName=tcp-comm-worker-#1-#216, blockedFor=21s] > 2021-02-14 02:08:16 WARN grid-timeout-worker-#206 root:576 - Possible > failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) > at > > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) > at > > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) > at > > org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > [02:08:16] Possible failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > 2021-02-14 02:08:16 WARN grid-timeout-worker-#206 > CacheDiagnosticManager:571 - Page locks dump: > > > 2021-02-14 02:08:28 ERROR grid-timeout-worker-#206 G:581 - Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [workerName=tcp-comm-worker, > threadName=tcp-comm-worker-#1-#216, blockedFor=33s] > 2021-02-14 02:08:28 WARN grid-timeout-worker-#206 root:576 - Possible > failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) > at > > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) > at > > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) > at > > org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > [02:08:28] Possible failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > 2021-02-14 02:08:28 WARN grid-timeout-worker-#206 > CacheDiagnosticManager:571 - Page locks dump: > > > 2021-02-14 02:08:32 WARN http-nio-8082-exec-5 TcpCommunicationSpi:576 - > Handshake timed out (will stop attempts to perform the handshake) > [node=6953d599-d606-4781-a6ba-43de7aff59e4, > connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000, > totalTimeout=10000, startNanos=1671081715938786, currTimeout=600000], > err=Operation timed out [timeoutStrategy= ExponentialBackoffTimeoutStrategy > [maxTimeout=600000, totalTimeout=10000, startNanos=1671081715938786, > currTimeout=600000]], addr=/127.0.0.1:47100, > failureDetectionTimeoutEnabled=true, timeout=0] > 2021-02-14 02:08:37 ERROR grid-timeout-worker-#206 G:581 - Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [workerName=tcp-comm-worker, > threadName=tcp-comm-worker-#1-#216, blockedFor=42s] > 2021-02-14 02:08:37 WARN grid-timeout-worker-#206 root:576 - Possible > failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) > at > > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) > at > > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) > at > > org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at > > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > [02:08:37] Possible failure suppressed accordingly to a configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] > 2021-02-14 02:08:37 WARN grid-timeout-worker-#206 > CacheDiagnosticManager:571 - Page locks dump: > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
