Hi Igniters,

Sometimes 'Blocked system-critical thread' errors are getting printed in prod 
cluster logs. As per below logs, it's saying exchange-worker blocked for 60s. 
There is no node join/left or cluster activation or cache creation, still why 
partition exchange is triggered. Even it is triggered,  it is blocked for 60s 
which is huge time from prod perspective.

Below are error details,
[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] 
Blocked system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=partition-exchanger, blockedFor=60s]
[2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread 
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING, 
blockCnt=16, waitCnt=2469961]

Cluster is responding but these errors priting, we are not understanding what's 
the cause, could you please help us.

Below is log snippet,

[2020-11-09T10:51:21,458][INFO 
][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] 
Checkpoint started [checkpointId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, 
startPtr=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], 
checkpointLockWait=0ms, checkpointLockHoldTime=34ms, 
walCpRecordFsyncDuration=7ms, pages=84152, reason='timeout']
[2020-11-09T10:51:23,499][INFO 
][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] 
Checkpoint finished [cpId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, pages=84152, 
markPos=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], 
walSegmentsCleared=5, walSegmentsCovered=[1279658 - 1279663], 
markDuration=79ms, pagesWrite=1195ms, fsync=845ms, total=2119ms]
[2020-11-09T10:51:36,788][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279664, segIdx=4, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal]
[2020-11-09T10:51:36,954][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal]
[2020-11-09T10:52:02,018][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279665, segIdx=5, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal]
[2020-11-09T10:52:02,200][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal]
[2020-11-09T10:52:36,541][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279666, segIdx=6, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal]
[2020-11-09T10:52:36,703][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal]
[2020-11-09T10:53:11,068][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279667, segIdx=7, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal]
[2020-11-09T10:53:11,239][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal]
[2020-11-09T10:53:45,694][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279668, segIdx=8, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal]
[2020-11-09T10:53:45,868][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal]
[2020-11-09T10:54:21,411][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279669, segIdx=9, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal]
[2020-11-09T10:54:21,480][INFO 
][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] 
Checkpoint started [checkpointId=33221fc2-5ab0-4929-9d23-8940e0552272, 
startPtr=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], 
checkpointLockWait=0ms, checkpointLockHoldTime=51ms, 
walCpRecordFsyncDuration=5ms, pages=86602, reason='timeout']
[2020-11-09T10:54:21,578][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal]
[2020-11-09T10:54:23,624][INFO 
][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] 
Checkpoint finished [cpId=33221fc2-5ab0-4929-9d23-8940e0552272, pages=86602, 
markPos=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], 
walSegmentsCleared=6, walSegmentsCovered=[1279664 - 1279669], 
markDuration=94ms, pagesWrite=1219ms, fsync=925ms, total=2238ms]
[2020-11-09T10:54:45,727][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279670, segIdx=0, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal]
[2020-11-09T10:54:45,886][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal]
[2020-11-09T10:55:17,146][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Starting to copy WAL segment [absIdx=1279671, segIdx=1, 
origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal,
 
dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal]
[2020-11-09T10:55:17,343][INFO 
][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager]
 Copied file 
[src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal,
 
dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal]
[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] 
Blocked system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=partition-exchanger, blockedFor=60s]
[2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread 
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING, 
blockCnt=16, waitCnt=2469961]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@223507fb,
 ownerName=null, ownerId=-1]

[2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC, finished=false, 
heartbeatTs=1604899458881]]]
org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, 
igniteInstanceName=EDIFCustomerCC, finished=false, heartbeatTs=1604899458881]
                at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
 [ignite-core-2.7.6.jar:2.7.6]
                at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
 [ignite-core-2.7.6.jar:2.7.6]
                at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
 [ignite-core-2.7.6.jar:2.7.6]
                at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
[ignite-core-2.7.6.jar:2.7.6]
                at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:513)
 [ignite-core-2.7.6.jar:2.7.6]
                at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.7.6.jar:2.7.6]
                at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2020-11-09T10:55:18,894][WARN 
][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] No deadlocked threads 
detected.
[2020-11-09T10:55:21,404][WARN 
][jvm-pause-detector-worker][IgniteKernal%EDIFCustomerCC] Possible too long JVM 
pause: 2477 milliseconds.
[2020-11-09T10:55:21,422][WARN 
][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] Thread dump at 
2020/11/09 10:55:21 IST
Thread [name="sys-#213675%EDIFCustomerCC%", id=369652, state=TIMED_WAITING, 
blockCnt=0, waitCnt=1]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd,
 ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#213674%EDIFCustomerCC%", id=369651, state=TIMED_WAITING, 
blockCnt=0, waitCnt=1]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd,
 ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Thanks and Regards,
Kamlesh Joshi

"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."

Reply via email to