[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216803#comment-17216803 ] Mirza Aliev commented on IGNITE-12451: -- [~alex_pl] changes looks good, I would merge them > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 20m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216710#comment-17216710 ] Ignite TC Bot commented on IGNITE-12451: {panel:title=Branch: [pull/8268/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/8268/head] Base: [master] : New Tests (6)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}Cache 1{color} [[tests 6|https://ci.ignite.apache.org/viewLog.html?buildId=5667005]] * {color:#013220}IgniteBinaryCacheTestSuite: IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=LOCAL, writeThrough=false] - PASSED{color} * {color:#013220}IgniteBinaryCacheTestSuite: IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=REPLICATED, writeThrough=true] - PASSED{color} * {color:#013220}IgniteBinaryCacheTestSuite: IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=LOCAL, writeThrough=true] - PASSED{color} * {color:#013220}IgniteBinaryCacheTestSuite: IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=PARTITIONED, writeThrough=false] - PASSED{color} * {color:#013220}IgniteBinaryCacheTestSuite: IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=REPLICATED, writeThrough=false] - PASSED{color} * {color:#013220}IgniteBinaryCacheTestSuite: IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=PARTITIONED, writeThrough=true] - PASSED{color} {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5667062buildTypeId=IgniteTests24Java8_RunAll] > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 20m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206092#comment-17206092 ] Aleksey Plekhanov commented on IGNITE-12451: [~maliev], we've benchmarked by yardstick on the following configuration: in-memory cluster, 6 servers, 3 clients (drivers), using IgnitePutAllBenchmark, cache with 1 backup, batch size 2 and 1: {noformat} CONFIGS="\ -cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} -b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 2 -dn IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs2_iter1,\ -cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} -b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 2 -dn IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs2_iter2,\ -cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} -b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 2 -dn IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs2_iter3,\ -cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} -b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 1 -dn IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs1_iter1,\ -cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} -b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 1 -dn IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs1_iter2,\ -cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} -b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 1 -dn IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs1_iter3\ " {noformat} Results: ||Iteration||Before, -bs2||Before, -bs1||After, -bs2||After, -bs1|| |1| 307433 | 135,91 | 307558 | 135,95 | |2|308135|135,64|305367|127,89| |3|303215|134,07|309356|135,96| |avg|306261|135,21|307427|133,27| Average comparison: || ||Before||After||Compare|| |IgnitePutAllBenchmark -bs2|306261|307427|0,38%| |IgnitePutAllBenchmark -bs1|135,2|133,2|-1,50%| > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 20m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203137#comment-17203137 ] Mirza Aliev commented on IGNITE-12451: -- [~alex_pl] Thank you for the detailed explanation! I tend to agree that this solution is better. Also, I've discussed the solution with [~agoncharuk] and we agreed that this solution definitely should be benchmarked as far as atomic cache operation is lightweight, and even additional adding to HashMap might bring some degradation on huge batch sizes. Also, note that _removeAll(Set m)_ operation is also affected > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 20m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200262#comment-17200262 ] Aleksey Plekhanov commented on IGNITE-12451: [~maliev], when an old thread can't lock entry, {{hasLockCollisions}} returns {{false}} for this thread (since there isn't another older thread exists which holds this lock) and an old thread tries to lock this entry again. When a new thread can't lock entry, {{hasLockCollisions}} returns {{true}} for this thread (since there is another older thread exists which holds this lock) and a new thread release all acquired locks and retry the whole operation. After locks have been released by a new thread - an old thread can acquire the lock it was waited for and proceed further. So, all deadlocks will be eventually resolved. > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200206#comment-17200206 ] Mirza Aliev commented on IGNITE-12451: -- > Since only younger threads retry all locks Could you please explain in more detail this phrase? As far as I can understand, only _older_ threads retry all locks, because the timeout has passed for old threads, while young threads still try to lock entry because their timeout has not happened. This means that there is a chance that the older thread will be deadlocked again with a new putAll thread that comes at the moment when the old thread started to take locks for its entries, and so on, this means that the older thread won't do any progress. > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200168#comment-17200168 ] Aleksey Plekhanov commented on IGNITE-12451: {{hasLockCollisions}} gives sometimes false-positive results, but I think it's not very often. If we already waited some time for lock most likely this thread already in deadlock and most likely thread that holds the lock we are trying to acquire blocking us. If not, we just retry and nothing bad happens, user operation is not affected by this retry. Fair cycles searching is more resource consuming (affects performance and footprint). This solution doesn't add any footprint and has no performance impact, but I see no advantages of fair cycles searching here. This algorithm does garantee that system won't hang. Since only younger threads retry all locks, older threads don't retry and will always advance in progress (there will be no infinite retries). Throwing exception to user leads to bad user experience. If we can perform this operation without exception, why should we throw it? > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200154#comment-17200154 ] Mirza Aliev commented on IGNITE-12451: -- I see that your solution is based on a _tryLock_ idea, we also came to that because real-time locks graph analyzing brought a significant performance drop. I see that you introduced _hasLockCollisions_ method, but IMHO it gives a lot of false-positive results, this method does not search for cycles in locks acquisition order. Also, it does not guarantee that system won't hang because there is a chance that the retry-lock-acquisition loop also will lead to deadlocks. So, I believe that the right way to handle deadlock and let the system continue to work is the following: we need to throw an exception when _tryLock_ timeouts and a user will see _CachePartialUpdateCheckedException_. __ This solution is almost developed in my PR and will be ready in a few days. [~alex_pl] what do you think? > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200107#comment-17200107 ] Aleksey Plekhanov commented on IGNITE-12451: [~maliev], we've faced this issue and I want to fix it fast. Actually I have a simple working prototype that affects only putAll operation (single-entry operation not affected), without performance impact (I hope): https://github.com/apache/ignite/pull/8268 . > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200051#comment-17200051 ] Mirza Aliev commented on IGNITE-12451: -- [~alex_pl] yes, currently I'm working on the ticket, in a few days I'll prepare PR > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200046#comment-17200046 ] Aleksey Plekhanov commented on IGNITE-12451: [~maliev], are you working on this ticket? Can I take it? > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Assignee: Mirza Aliev >Priority: Major > Fix For: 2.10 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005370#comment-17005370 ] Ivan Pavlukhin commented on IGNITE-12451: - [~ivan.glukos], [~maliev], would be great to see some kind of reasoning about complexity of a chosen algorithm in a ticket description. > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Priority: Major > Fix For: 2.9 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004267#comment-17004267 ] Ivan Rakov commented on IGNITE-12451: - [~Pavlukhin] ThreadMXBean#findDeadlockedThreads is too resource-heavy. Reusing existing facilities is fine, e.g. Guava has its https://guava.dev/releases/22.0/api/docs/com/google/common/util/concurrent/CycleDetectingLockFactory.html. Perhaps we can borrow this implementation ;) > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Priority: Major > Fix For: 2.9 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997909#comment-16997909 ] Ivan Pavlukhin commented on IGNITE-12451: - [~maliev], I would like to outline my main concerns here. When I saw a ticket related to deadlock detection for ReentrantLock and friends, my first thought was why already existing facilities cannot be used. Actually, I think that having a note in a ticket description explaining why a custom implementation is needed would be enough (and very good as well). In general, a deadlock detection in any case will bring an additional overhead. And I have doubts that it can be inexpensive. By the way, we can consider other strategies for deadlocks mitigation. E.g. it might be a deadlock _prevention_, it is simpler and perhaps fits current problem. > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Priority: Major > Fix For: 2.9 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997239#comment-16997239 ] Mirza Aliev commented on IGNITE-12451: -- [~Pavlukhin] I'm assuming that these note from javadoc of {{java.lang.management.ThreadMXBean#findDeadlockedThreads}} might be the reason to not use such approach. {noformat} This method is designed for troubleshooting use, but not for synchronization control. It might be an expensive operation.{noformat} What do you think [~Pavlukhin] [~ivan.glukos]? > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Priority: Major > Fix For: 2.9 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
[ https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996196#comment-16996196 ] Ivan Pavlukhin commented on IGNITE-12451: - [~ivan.glukos], can {{java.lang.management.ThreadMXBean#findDeadlockedThreads}} help here? If not, why? > Introduce deadlock detection for cache entry reentrant locks > > > Key: IGNITE-12451 > URL: https://issues.apache.org/jira/browse/IGNITE-12451 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.6 >Reporter: Ivan Rakov >Priority: Major > Fix For: 2.9 > > > Aside from IGNITE-12365, we still have possible threat of cache-entry-level > deadlock in case of careless usage of JCache mass operations (putAll, > removeAll): > 1. If two different user threads will perform putAll on the same two keys in > reverse order (primary node for which is the same), there's a chance that > sys-stripe threads will be deadlocked. > 2. Even without direct contract violation from user side, HashMap can be > passed as argument for putAll. Even if user threads have called mass > operations with two keys in the same order, HashMap iteration order is not > strictly defined, which may cause the same deadlock. > Local deadlock detection should mitigate this issue. We can create a wrapper > for ReentrantLock with logic that performs cycle detection in wait-for graph > in case we are waiting for lock acquisition for too long. Exception will be > thrown from one of the threads in such case, failing user operation, but > letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)