Thank you. So what should i do? Client node disconnected after this error and client can not reconnect to the cluster until i reboot my application, client node and server node. How to client node reconnect to cluster?
Ilya Kasnacheev <[email protected]>, 24 Şub 2021 Çar, 13:57 tarihinde şunu yazdı: > Hello! > > Looks like network problems, long GC on server node or some kind of > deadlock on server node which prevents it from responding. > > Regards, > -- > Ilya Kasnacheev > > > ср, 24 февр. 2021 г. в 13:09, oguzhan <[email protected]>: > >> Hello, >> >> We have 1 client node and 1 server node and we are using ignite version >> 2.9.1. >> >> Our application is scheduled to do the same jobs every day. Then our >> application did not get any errors for 2 weeks, but 2 weeks later, we are >> getting this error as you can see below (We get such an error about every >> 2 >> weeks): >> >> I hope you support to solve my problem. Thanks and best regards... >> >> >> 2021-02-14 02:07:34 WARN tcp-client-disco-reconnector-#7-#77756 >> TcpDiscoverySpi:576 - Failed to connect to any address from IP finder >> (will >> retry to join topology every 2000 ms; change 'reconnectDelay' to configure >> the frequency of retries): [/127.0.0.1:47500, /127.0.0.1:47501, >> /127.0.0.1:47502, /127.0.0.1:47503, /127.0.0.1:47504, /127.0.0.1:47505, >> /127.0.0.1:47506, /127.0.0.1:47507, /127.0.0.1:47508, /127.0.0.1:47509] >> 2021-02-14 02:07:37 INFO grid-timeout-worker-#206 IgniteKernal:566 - >> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >> ^-- Node [id=2fefd66f, uptime=4 days, 13:33:34.341] >> ^-- Cluster [hosts=1, CPUs=16, servers=1, clients=1, topVer=2, >> minorTopVer=18985] >> ^-- Network [addrs=[10.86.26.180, 127.0.0.1], discoPort=0, >> commPort=47101] >> ^-- CPU [CPUs=16, curLoad=1.07%, avgLoad=0.05%, GC=0.1%] >> ^-- Heap [used=865MB, free=92.96%, comm=12274MB] >> ^-- Off-heap memory [used=0MB, free=100%, allocated=0MB] >> ^-- Page memory [pages=0] >> ^-- sysMemPlc region [type=internal, persistence=false, >> lazyAlloc=false, >> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, >> allocRam=0MB] >> ^-- TxLog region [type=internal, persistence=false, lazyAlloc=false, >> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, >> allocRam=0MB] >> ^-- Default_Region region [type=default, persistence=false, >> lazyAlloc=true, >> ... initCfg=256MB, maxCfg=32768MB, usedRam=0MB, freeRam=100%, >> allocRam=0MB] >> ^-- Outbound messages queue [size=0] >> ^-- Public thread pool [active=0, idle=0, qSize=0] >> ^-- System thread pool [active=0, idle=81, qSize=0] >> 2021-02-14 02:07:38 ERROR tcp-client-disco-sock-writer-#2-#230 >> TcpDiscoverySpi:586 - Failed to send message: null >> java.io.IOException: Failed to get acknowledge for message: >> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage >> [sndNodeId=null, id=1d467368771-2fefd66f-0954-45dd-aa32-a33e58567950, >> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, >> isClient=true]] >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1471) >> at >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) >> 2021-02-14 02:07:44 WARN tcp-comm-worker-#1-#216 TcpCommunicationSpi:576 >> - >> Handshake timed out (will stop attempts to perform the handshake) >> [node=6953d599-d606-4781-a6ba-43de7aff59e4, >> connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000, >> totalTimeout=10000, startNanos=1671033974906026, currTimeout=600000], >> err=Operation timed out [timeoutStrategy= >> ExponentialBackoffTimeoutStrategy >> [maxTimeout=600000, totalTimeout=10000, startNanos=1671033974906026, >> currTimeout=600000]], addr=/127.0.0.1:47100, >> failureDetectionTimeoutEnabled=true, timeout=0] >> 2021-02-14 02:07:54 WARN tcp-comm-worker-#1-#216 TcpCommunicationSpi:576 >> - >> Handshake timed out (will stop attempts to perform the handshake) >> [node=6953d599-d606-4781-a6ba-43de7aff59e4, >> connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000, >> totalTimeout=10000, startNanos=1671044002786218, currTimeout=600000], >> err=Operation timed out [timeoutStrategy= >> ExponentialBackoffTimeoutStrategy >> [maxTimeout=600000, totalTimeout=10000, startNanos=1671044002786218, >> currTimeout=600000]], addr=dwccatp01/10.86.26.180:47100, >> failureDetectionTimeoutEnabled=true, timeout=0] >> 2021-02-14 02:08:06 ERROR grid-timeout-worker-#206 G:581 - Blocked >> system-critical thread has been detected. This can lead to cluster-wide >> undefined behaviour [workerName=tcp-comm-worker, >> threadName=tcp-comm-worker-#1-#216, blockedFor=11s] >> 2021-02-14 02:08:06 WARN grid-timeout-worker-#206 root:576 - Possible >> failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] >> at sun.misc.Unsafe.park(Native Method) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) >> at >> >> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) >> at >> >> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) >> at >> >> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) >> at >> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) >> at >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) >> [02:08:06] Possible failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> 2021-02-14 02:08:07 WARN grid-timeout-worker-#206 >> CacheDiagnosticManager:571 - Page locks dump: >> >> >> 2021-02-14 02:08:16 ERROR grid-timeout-worker-#206 G:581 - Blocked >> system-critical thread has been detected. This can lead to cluster-wide >> undefined behaviour [workerName=tcp-comm-worker, >> threadName=tcp-comm-worker-#1-#216, blockedFor=21s] >> 2021-02-14 02:08:16 WARN grid-timeout-worker-#206 root:576 - Possible >> failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] >> at sun.misc.Unsafe.park(Native Method) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) >> at >> >> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) >> at >> >> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) >> at >> >> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) >> at >> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) >> at >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) >> [02:08:16] Possible failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> 2021-02-14 02:08:16 WARN grid-timeout-worker-#206 >> CacheDiagnosticManager:571 - Page locks dump: >> >> >> 2021-02-14 02:08:28 ERROR grid-timeout-worker-#206 G:581 - Blocked >> system-critical thread has been detected. This can lead to cluster-wide >> undefined behaviour [workerName=tcp-comm-worker, >> threadName=tcp-comm-worker-#1-#216, blockedFor=33s] >> 2021-02-14 02:08:28 WARN grid-timeout-worker-#206 root:576 - Possible >> failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] >> at sun.misc.Unsafe.park(Native Method) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) >> at >> >> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) >> at >> >> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) >> at >> >> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) >> at >> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) >> at >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) >> [02:08:28] Possible failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> 2021-02-14 02:08:28 WARN grid-timeout-worker-#206 >> CacheDiagnosticManager:571 - Page locks dump: >> >> >> 2021-02-14 02:08:32 WARN http-nio-8082-exec-5 TcpCommunicationSpi:576 - >> Handshake timed out (will stop attempts to perform the handshake) >> [node=6953d599-d606-4781-a6ba-43de7aff59e4, >> connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000, >> totalTimeout=10000, startNanos=1671081715938786, currTimeout=600000], >> err=Operation timed out [timeoutStrategy= >> ExponentialBackoffTimeoutStrategy >> [maxTimeout=600000, totalTimeout=10000, startNanos=1671081715938786, >> currTimeout=600000]], addr=/127.0.0.1:47100, >> failureDetectionTimeoutEnabled=true, timeout=0] >> 2021-02-14 02:08:37 ERROR grid-timeout-worker-#206 G:581 - Blocked >> system-critical thread has been detected. This can lead to cluster-wide >> undefined behaviour [workerName=tcp-comm-worker, >> threadName=tcp-comm-worker-#1-#216, blockedFor=42s] >> 2021-02-14 02:08:37 WARN grid-timeout-worker-#206 root:576 - Possible >> failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823] >> at sun.misc.Unsafe.park(Native Method) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) >> at >> >> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449) >> at >> >> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493) >> at >> >> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688) >> at >> >> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951) >> at >> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >> at >> >> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503) >> at >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) >> [02:08:37] Possible failure suppressed accordingly to a configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class >> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, >> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]] >> 2021-02-14 02:08:37 WARN grid-timeout-worker-#206 >> CacheDiagnosticManager:571 - Page locks dump: >> >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> >
