Sadayuki Furuhashi created IGNITE-7696: ------------------------------------------
Summary: Deadlock at GridDhtAtomicCache.lockEntries called through GridDhtAtomicCache.updateAllAsyncInternal Key: IGNITE-7696 URL: https://issues.apache.org/jira/browse/IGNITE-7696 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.3 Environment: * Ignite 2.3 * OpenJDK version "1.8.0_151" * Linux 4.4.0 Reporter: Sadayuki Furuhashi We observed that all nodes in a cluster completely stalls and put/get/remove operations to a cache blocks for ever. When it happens, we can see following log in thread dump: {code:java} 2018-02-14_04:21:33.84410 Found one Java-level deadlock: 2018-02-14_04:21:33.84410 ============================= 2018-02-14_04:21:33.84411 "sys-#41%IgniteManager%": 2018-02-14_04:21:33.84411 waiting to lock monitor 0x00007f6d5e41a558 (object 0x0000000781083ef0, a org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry), 2018-02-14_04:21:33.84411 which is held by "sys-stripe-5-#6%IgniteManager%" 2018-02-14_04:21:33.84412 "sys-stripe-5-#6%IgniteManager%": 2018-02-14_04:21:33.84412 waiting to lock monitor 0x00007f6d5e41de68 (object 0x0000000781083e70, a org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry) 2018-02-14_04:21:33.84412 in JNI, which is held by "sys-stripe-2-#3%IgniteManager%" 2018-02-14_04:21:33.84412 "sys-stripe-2-#3%IgniteManager%": 2018-02-14_04:21:33.84413 waiting to lock monitor 0x00007f6d5e41a558 (object 0x0000000781083ef0, a org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry) 2018-02-14_04:21:33.84413 in JNI, which is held by "sys-stripe-5-#6%IgniteManager%" 2018-02-14_04:21:33.84413 2018-02-14_04:21:33.84414 Java stack information for the threads listed above: 2018-02-14_04:21:33.84414 =================================================== 2018-02-14_04:21:33.84416 "sys-#41%IgniteManager%": 2018-02-14_04:21:33.84416 at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.markObsoleteVersion(GridCacheMapEntry.java:2153) 2018-02-14_04:21:33.84417 - waiting to lock <0x0000000781083ef0> (a org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry) 2018-02-14_04:21:33.84417 at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.removeVersionedEntry(GridDhtLocalPartition.java:368) 2018-02-14_04:21:33.84418 at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.cleanupRemoveQueue(GridDhtLocalPartition.java:392) 2018-02-14_04:21:33.84418 at org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask$1.run(GridCacheProcessor.java:4051) 2018-02-14_04:21:33.84418 at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6687) 2018-02-14_04:21:33.84419 at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) 2018-02-14_04:21:33.84419 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 2018-02-14_04:21:33.84419 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2018-02-14_04:21:33.84420 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2018-02-14_04:21:33.84421 at java.lang.Thread.run(Thread.java:748) 2018-02-14_04:21:33.84421 "sys-stripe-5-#6%IgniteManager%": 2018-02-14_04:21:33.84421 at sun.misc.Unsafe.monitorEnter(Native Method) 2018-02-14_04:21:33.84421 at org.apache.ignite.internal.util.GridUnsafe.monitorEnter(GridUnsafe.java:1207) 2018-02-14_04:21:33.84422 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.lockEntries(GridDhtAtomicCache.java:2848) 2018-02-14_04:21:33.84422 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1707) 2018-02-14_04:21:33.84423 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1629) 2018-02-14_04:21:33.84423 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3056) 2018-02-14_04:21:33.84424 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:131) 2018-02-14_04:21:33.84424 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:267) 2018-02-14_04:21:33.84425 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:262) 2018-02-14_04:21:33.84425 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060) 2018-02-14_04:21:33.84425 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) 2018-02-14_04:21:33.84426 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) 2018-02-14_04:21:33.84426 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) 2018-02-14_04:21:33.84426 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) 2018-02-14_04:21:33.84427 at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) 2018-02-14_04:21:33.84427 at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555) 2018-02-14_04:21:33.84427 at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183) 2018-02-14_04:21:33.84428 at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126) 2018-02-14_04:21:33.84429 at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090) 2018-02-14_04:21:33.84429 at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:505) 2018-02-14_04:21:33.84429 at java.lang.Thread.run(Thread.java:748) 2018-02-14_04:21:33.84429 "sys-stripe-2-#3%IgniteManager%": 2018-02-14_04:21:33.84430 at sun.misc.Unsafe.monitorEnter(Native Method) 2018-02-14_04:21:33.84430 at org.apache.ignite.internal.util.GridUnsafe.monitorEnter(GridUnsafe.java:1207) 2018-02-14_04:21:33.84430 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.lockEntries(GridDhtAtomicCache.java:2848) 2018-02-14_04:21:33.84431 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1707) 2018-02-14_04:21:33.84431 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1629) 2018-02-14_04:21:33.84431 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3056) 2018-02-14_04:21:33.84433 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:131) 2018-02-14_04:21:33.84433 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:267) 2018-02-14_04:21:33.84433 at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:262) 2018-02-14_04:21:33.84434 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060) 2018-02-14_04:21:33.84434 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) 2018-02-14_04:21:33.84434 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) 2018-02-14_04:21:33.84435 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) 2018-02-14_04:21:33.84435 at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) 2018-02-14_04:21:33.84436 at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) 2018-02-14_04:21:33.84437 at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555) 2018-02-14_04:21:33.84437 at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183) 2018-02-14_04:21:33.84437 at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126) 2018-02-14_04:21:33.84438 at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090) 2018-02-14_04:21:33.84438 at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:505) 2018-02-14_04:21:33.84438 at java.lang.Thread.run(Thread.java:748) 2018-02-14_04:21:33.84439 2018-02-14_04:21:33.84439 Found 1 deadlock. {code} This indicates that GridDhtAtomicCache.lockEntries (threads sys-stripe-5-... and sys-stripe-7-...) is causing deadlock, and GridCacheMapEntry.markObsoleteVersion (sys-#41...) is involved in it: * Thread "sys-stripe-5-..." locked 0x...3ef0, waits for 0x...3e70 * Thread "sys-stripe-7-..." locked 0x...3e70, waits for 0x...3ef0 * Thread "sys-#41..." waits for 0x...3ef0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)