Hello! >From this log:
[17:19:09,949][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 1405 milliseconds. [17:19:12,237][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 1983 milliseconds. [17:19:14,416][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 2029 milliseconds. [17:19:16,619][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 2103 milliseconds. [17:19:18,948][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 2279 milliseconds. [17:19:21,217][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 2219 milliseconds. [17:19:23,268][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 2001 milliseconds. [17:19:25,028][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 1710 milliseconds. [17:19:28,814][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 3736 milliseconds. [17:19:30,962][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 2098 milliseconds. [17:19:32,553][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 1541 milliseconds. [17:19:37,938][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 3837 milliseconds. [17:19:51,271][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 13200 milliseconds. [17:19:57,222][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 7482 milliseconds. [17:20:17,384][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 5832 milliseconds. [17:20:17,384][SEVERE][exchange-worker-#43][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=grid-timeout-worker, blockedFor=10s] [17:20:36,342][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=server: 2016/redacted_ip:47500, rmtPort=47500] [17:20:36,342][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/redacted_ip, rmtPort=56925] [17:20:36,342][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 30741 milliseconds. [17:20:42,276][SEVERE][nio-acceptor-tcp-rest-#39][GridTcpRestProtocol] Runtime error caught during grid runnable execution: GridWorker [name=nio-acceptor-tcp-rest, igniteInstanceName=null, finished=false, heartbeatTs=1581322824712, hashCode=328613569, interrupted=false, runner=nio-acceptor-tcp-rest-#39] *java.lang.OutOfMemoryError: GC overhead limit exceeded* So, you have plainly run out of heap, and Ignite is likely not to blame since we are not using a lot of heap. I recommend collecting heap dumps, searching for leaks in your own code / use patterns. Regards, -- Ilya Kasnacheev ср, 19 февр. 2020 г. в 07:01, wentat <[email protected]>: > Hi Ilya, > > Thank you for your reply. I have done this test a few times and I > consistently get stalling grids during failover/scaling/server swapping > > I have tried tuning some parameters, according to ignite production prep > docs <https://apacheignite.readme.io/docs/preparing-for-production> . I > have increased the heap size to max of 10GB, removed logging of metrics and > set igcfg.setFailureDetectionTimeout(60000); - one hour! However, this was > done after the 2 tries in this thread. > > I will try to run one time and get logs for whole cluster including GC if > problem persists but it will take some time as I have moved on to other > tests. Meanwhile, here is the original log from my first experiment. Maybe > you can have a clue. > > Once again, thank you for your time in this issue > > crash.log > <http://apache-ignite-users.70518.x6.nabble.com/file/t2779/crash.log> > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
