There are a few issues here — the blocked thread, the communication error — but 
I possibly the key one is the JVM pause:

[2020-07-03T18:17:21,793][WARN 
][jvm-pause-detector-worker][IgniteKernal%CustomerCC] Possible too long JVM 
pause: 10133 milliseconds.

This is usually due to garbage collection, but there are a number of other 
possibilities such as slow I/O. Suggest you start with the recommendations on 
the GC tuning documentation page: 
https://apacheignite.readme.io/docs/jvm-and-system-tuning

Regards,
Stephen

> On 4 Jul 2020, at 12:44, Kamlesh Joshi <[email protected]> wrote:
> 
> Hi Team,
>  
> We have encountered following defect in PROD environment. After which entire 
> traffic got halted for around 10 minutes, we recently upgraded our cluster to 
> Ignite 2.7.6 from 2.6.0. 
> Is this related to any existing open defect in this version? Has anyone 
> observed the same defect earlier ?
>  
> Any help or pointers around this will be appreciated.
>  
>  
> [2020-07-03T18:17:11,613][ERROR][sys-stripe-36-#37%CustomerCC%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour
> [threadName=partition-exchanger, blockedFor=480s]
> [2020-07-03T18:17:11,613][WARN ][sys-stripe-36-#37%CustomerCC%][G] Thread 
> [name="exchange-worker-#344%CustomerCC%", id=391, state=TIMED_WAITING, 
> blockCnt=1, waitCnt=2049782]
>     Lock 
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6bf9f3a4,
>  ownerName=null, ownerId=-1]
>  
> [2020-07-03T18:17:11,620][ERROR][sys-stripe-36-#37%CustomerCC%][] Critical 
> system error detected. Will be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=partition-exchanger, igniteInstanceName=CustomerCC, finished=false, 
> heartbeatTs=1593780431612]]]
> org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, 
> igniteInstanceName=CustomerCC, finished=false, heartbeatTs=1593780431612]
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  [ignite-core-2.7.6.jar:2.7.6]
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  [ignite-core-2.7.6.jar:2.7.6]
>     at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>  [ignite-core-2.7.6.jar:2.7.6]
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
> [ignite-core-2.7.6.jar:2.7.6]
>     at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:513)
>  [ignite-core-2.7.6.jar:2.7.6]
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
> [ignite-core-2.7.6.jar:2.7.6]
>     at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
> [2020-07-03T18:17:11,625][WARN 
> ][sys-stripe-36-#37%CustomerCC%][FailureProcessor] No deadlocked threads 
> detected.
> [2020-07-03T18:17:21,790][INFO 
> ][tcp-disco-sock-reader-#201%CustomerCC%][TcpDiscoverySpi] Finished serving 
> remote node connection [rmtAddr=/xx.xx.xx.xx:46416, rmtPort=46416
> [2020-07-03T18:17:21,793][WARN 
> ][jvm-pause-detector-worker][IgniteKernal%CustomerCC] Possible too long JVM 
> pause: 10133 milliseconds.
>     [2020-07-03T18:17:21,794][WARN 
> ][grid-nio-worker-tcp-comm-31-#295%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:11764, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,794][WARN 
> ][grid-nio-worker-tcp-comm-57-#321%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:38500, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,794][WARN 
> ][grid-nio-worker-tcp-comm-5-#269%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:41442, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,794][WARN 
> ][grid-nio-worker-tcp-comm-53-#317%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:44178, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,794][WARN 
> ][grid-nio-worker-tcp-comm-59-#323%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:11884, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,795][WARN 
> ][grid-nio-worker-tcp-comm-59-#323%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:39044, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,795][WARN 
> ][grid-nio-worker-tcp-comm-53-#317%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:48756, 
> writeTimeout=2000]
> [2020-07-03T18:17:21,795][WARN 
> ][grid-nio-worker-tcp-comm-59-#323%CustomerCC%][TcpCommunicationSpi] 
> Communication SPI session write timed out (consider increasing 
> 'socketWriteTimeout' configuration property) [remoteAddr=/xx.xx.xx.xx:42190, 
> writeTimeout=2000]
>  
>  
>  
>  
>  
> Thanks and Regards,
> Kamlesh Joshi
>  
> 
> "Confidentiality Warning: This message and any attachments are intended only 
> for the use of the intended recipient(s), are confidential and may be 
> privileged. If you are not the intended recipient, you are hereby notified 
> that any review, re-transmission, conversion to hard copy, copying, 
> circulation or other use of this message and any attachments is strictly 
> prohibited. If you are not the intended recipient, please notify the sender 
> immediately by return email and delete this message and any attachments from 
> your system.
> 
> Virus Warning: Although the company has taken reasonable precautions to 
> ensure no viruses are present in this email. The company cannot accept 
> responsibility for any loss or damage arising from the use of this email or 
> attachment."
> 


Reply via email to