Hello! Can you please find an actual stack trace from partition-exchanger thread in that log?
One that starts with Thread [name="partition-exchanger" ? Regards, -- Ilya Kasnacheev вт, 10 нояб. 2020 г. в 15:54, Kamlesh Joshi <[email protected]>: > Hi Igniters, > > > > Sometimes 'Blocked system-critical thread' errors are getting printed in > prod cluster logs. As per below logs, it's saying exchange-worker blocked > for 60s. There is no node join/left or cluster activation or cache > creation, still why partition exchange is triggered. Even it is triggered, > it is blocked for *60s *which is huge time from prod perspective. > > > > Below are error details, > > [2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour [threadName=partition-exchanger, > *blockedFor=60s*] > > [2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] > Thread [name="exchange-worker-#344%EDIFCustomerCC%", id=391, > state=TIMED_WAITING, blockCnt=16, waitCnt=2469961] > > > > Cluster is responding but these errors priting, we are not understanding > what's the cause, could you please help us. > > > > Below is log snippet, > > > > [2020-11-09T10:51:21,458][INFO > ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] > Checkpoint started [checkpointId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, > startPtr=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], > checkpointLockWait=0ms, checkpointLockHoldTime=34ms, > walCpRecordFsyncDuration=7ms, pages=84152, reason='timeout'] > > [2020-11-09T10:51:23,499][INFO > ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] > Checkpoint finished [cpId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, > pages=84152, markPos=FileWALPointer [idx=1279664, fileOff=28148165, > len=49557], walSegmentsCleared=5, walSegmentsCovered=[1279658 - 1279663], > markDuration=79ms, pagesWrite=1195ms, fsync=845ms, total=2119ms] > > [2020-11-09T10:51:36,788][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279664, segIdx=4, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal] > > [2020-11-09T10:51:36,954][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal] > > [2020-11-09T10:52:02,018][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279665, segIdx=5, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal] > > [2020-11-09T10:52:02,200][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal] > > [2020-11-09T10:52:36,541][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279666, segIdx=6, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal] > > [2020-11-09T10:52:36,703][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal] > > [2020-11-09T10:53:11,068][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279667, segIdx=7, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal] > > [2020-11-09T10:53:11,239][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal] > > [2020-11-09T10:53:45,694][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279668, segIdx=8, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal] > > [2020-11-09T10:53:45,868][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal] > > [2020-11-09T10:54:21,411][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279669, segIdx=9, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal] > > [2020-11-09T10:54:21,480][INFO > ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] > Checkpoint started [checkpointId=33221fc2-5ab0-4929-9d23-8940e0552272, > startPtr=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], > checkpointLockWait=0ms, checkpointLockHoldTime=51ms, > walCpRecordFsyncDuration=5ms, pages=86602, reason='timeout'] > > [2020-11-09T10:54:21,578][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal] > > [2020-11-09T10:54:23,624][INFO > ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] > Checkpoint finished [cpId=33221fc2-5ab0-4929-9d23-8940e0552272, > pages=86602, markPos=FileWALPointer [idx=1279670, fileOff=5561792, > len=49557], walSegmentsCleared=6, walSegmentsCovered=[1279664 - 1279669], > markDuration=94ms, pagesWrite=1219ms, fsync=925ms, total=2238ms] > > [2020-11-09T10:54:45,727][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279670, segIdx=0, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal] > > [2020-11-09T10:54:45,886][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal] > > [2020-11-09T10:55:17,146][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=1279671, segIdx=1, > origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal, > dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal] > > [2020-11-09T10:55:17,343][INFO > ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] > Copied file > [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal, > dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal] > > [2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour [threadName=partition-exchanger, > blockedFor=60s] > > [2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] > Thread [name="exchange-worker-#344%EDIFCustomerCC%", id=391, > state=TIMED_WAITING, blockCnt=16, waitCnt=2469961] > > Lock > [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@223507fb, > ownerName=null, ownerId=-1] > > > > [2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker > [name=partition-exchanger, igniteInstanceName=EDIFCustomerCC, > finished=false, heartbeatTs=1604899458881]]] > > org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, > igniteInstanceName=EDIFCustomerCC, finished=false, > heartbeatTs=1604899458881] > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) > [ignite-core-2.7.6.jar:2.7.6] > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) > [ignite-core-2.7.6.jar:2.7.6] > > at > org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) > [ignite-core-2.7.6.jar:2.7.6] > > at > org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) > [ignite-core-2.7.6.jar:2.7.6] > > at > org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:513) > [ignite-core-2.7.6.jar:2.7.6] > > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > [ignite-core-2.7.6.jar:2.7.6] > > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151] > > [2020-11-09T10:55:18,894][WARN > ][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] No deadlocked > threads detected. > > [2020-11-09T10:55:21,404][WARN > ][jvm-pause-detector-worker][IgniteKernal%EDIFCustomerCC] Possible too long > JVM pause: 2477 milliseconds. > > [2020-11-09T10:55:21,422][WARN > ][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] Thread dump at > 2020/11/09 10:55:21 IST > > Thread [name="sys-#213675%EDIFCustomerCC%", id=369652, > state=TIMED_WAITING, blockCnt=0, waitCnt=1] > > Lock > [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd, > ownerName=null, ownerId=-1] > > at sun.misc.Unsafe.park(Native Method) > > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > > at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) > > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > Thread [name="sys-#213674%EDIFCustomerCC%", id=369651, > state=TIMED_WAITING, blockCnt=0, waitCnt=1] > > Lock > [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd, > ownerName=null, ownerId=-1] > > at sun.misc.Unsafe.park(Native Method) > > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > > at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) > > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > > > *Thanks and Regards,* > > *Kamlesh Joshi* > > > > > "*Confidentiality Warning*: This message and any attachments are intended > only for the use of the intended recipient(s), are confidential and may be > privileged. If you are not the intended recipient, you are hereby notified > that any review, re-transmission, conversion to hard copy, copying, > circulation or other use of this message and any attachments is strictly > prohibited. If you are not the intended recipient, please notify the sender > immediately by return email and delete this message and any attachments > from your system. > > *Virus Warning:* Although the company has taken reasonable precautions to > ensure no viruses are present in this email. The company cannot accept > responsibility for any loss or damage arising from the use of this email or > attachment." >
