[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-10-19 Thread Mirza Aliev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216803#comment-17216803
 ] 

Mirza Aliev commented on IGNITE-12451:
--

[~alex_pl] changes looks good, I would merge them 

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-10-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216710#comment-17216710
 ] 

Ignite TC Bot commented on IGNITE-12451:


{panel:title=Branch: [pull/8268/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/8268/head] Base: [master] : New Tests 
(6)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}Cache 1{color} [[tests 
6|https://ci.ignite.apache.org/viewLog.html?buildId=5667005]]
* {color:#013220}IgniteBinaryCacheTestSuite: 
IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=LOCAL,
 writeThrough=false] - PASSED{color}
* {color:#013220}IgniteBinaryCacheTestSuite: 
IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=REPLICATED,
 writeThrough=true] - PASSED{color}
* {color:#013220}IgniteBinaryCacheTestSuite: 
IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=LOCAL,
 writeThrough=true] - PASSED{color}
* {color:#013220}IgniteBinaryCacheTestSuite: 
IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=PARTITIONED,
 writeThrough=false] - PASSED{color}
* {color:#013220}IgniteBinaryCacheTestSuite: 
IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=REPLICATED,
 writeThrough=false] - PASSED{color}
* {color:#013220}IgniteBinaryCacheTestSuite: 
IgniteCacheAtomicConcurrentUnorderedUpdateAllTest.testConcurrentUpdateAll[cacheMode=PARTITIONED,
 writeThrough=true] - PASSED{color}

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5667062buildTypeId=IgniteTests24Java8_RunAll]

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-10-02 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206092#comment-17206092
 ] 

Aleksey Plekhanov commented on IGNITE-12451:


[~maliev], we've benchmarked by yardstick on the following configuration: 
in-memory cluster, 6 servers, 3 clients (drivers), using IgnitePutAllBenchmark, 
cache with 1 backup, batch size 2 and 1:
{noformat}
CONFIGS="\
-cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} 
-b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 2 -dn 
IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs2_iter1,\
-cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} 
-b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 2 -dn 
IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs2_iter2,\
-cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} 
-b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 2 -dn 
IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs2_iter3,\
-cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} 
-b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 1 -dn 
IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs1_iter1,\
-cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} 
-b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 1 -dn 
IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs1_iter2,\
-cfg ${SCRIPT_DIR}/../config/ignite-remote-config.xml -cwd -cl -nn ${nodesNum} 
-b ${b} -w ${w} -d ${d} -t ${t} -sm ${sm} -wm ${wm} -bs 1 -dn 
IgnitePutAllBenchmark -sn IgniteNode -ds ${ver}-atomic-putAll-bs1_iter3\
"
{noformat}

Results:

||Iteration||Before, -bs2||Before, -bs1||After, -bs2||After, -bs1||
|1| 307433 | 135,91 | 307558 | 135,95 |
|2|308135|135,64|305367|127,89|
|3|303215|134,07|309356|135,96|
|avg|306261|135,21|307427|133,27|

Average comparison:
|| ||Before||After||Compare||
|IgnitePutAllBenchmark -bs2|306261|307427|0,38%|
|IgnitePutAllBenchmark  -bs1|135,2|133,2|-1,50%| 


> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-28 Thread Mirza Aliev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203137#comment-17203137
 ] 

Mirza Aliev commented on IGNITE-12451:
--

[~alex_pl] Thank you for the detailed explanation! I tend to agree that this 
solution is better. 

Also, I've discussed the solution with [~agoncharuk] and we agreed that this 
solution definitely should be benchmarked as far as atomic cache operation is 
lightweight, and even additional adding to HashMap might bring some degradation 
on huge batch sizes.

 

Also, note that _removeAll(Set m)_ operation is also affected 

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200262#comment-17200262
 ] 

Aleksey Plekhanov commented on IGNITE-12451:


[~maliev], when an old thread can't lock entry, {{hasLockCollisions}} returns 
{{false}} for this thread (since there isn't another older thread exists which 
holds this lock) and an old thread tries to lock this entry again. 
When a new thread can't lock entry, {{hasLockCollisions}} returns {{true}} for 
this thread (since there is another older thread exists which holds this lock) 
and a new thread release all acquired locks and retry the whole operation. 
After locks have been released by a new thread - an old thread can acquire the 
lock it was waited for and proceed further. So, all deadlocks will be 
eventually resolved.  

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Mirza Aliev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200206#comment-17200206
 ] 

Mirza Aliev commented on IGNITE-12451:
--

> Since only younger threads retry all locks

Could you please explain in more detail this phrase? As far as I can 
understand, only _older_ threads retry all locks, because the timeout has 
passed for old threads, while young threads still try to lock entry because 
their timeout has not happened. This means that there is a chance that the 
older thread will be deadlocked again with a new putAll thread that comes at 
the moment when the old thread started to take locks for its entries, and so 
on, this means that the older thread won't do any progress. 

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200168#comment-17200168
 ] 

Aleksey Plekhanov commented on IGNITE-12451:


{{hasLockCollisions}} gives sometimes false-positive results, but I think it's 
not very often. If we already waited some time for lock most likely this thread 
already in deadlock and most likely thread that holds the lock we are trying to 
acquire blocking us. If not, we just retry and nothing bad happens, user 
operation is not affected by this retry.

Fair cycles searching is more resource consuming (affects performance and 
footprint). This solution doesn't add any footprint and has no performance 
impact, but I see no advantages of fair cycles searching here. 

This algorithm does garantee that system won't hang. Since only younger threads 
retry all locks, older threads don't retry and will always advance in progress 
(there will be no infinite retries).   

Throwing exception to user leads to bad user experience. If we can perform this 
operation without exception, why should we throw it? 


> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Mirza Aliev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200154#comment-17200154
 ] 

Mirza Aliev commented on IGNITE-12451:
--

I see that your solution is based on a _tryLock_ idea, we also came to that 
because real-time locks graph analyzing brought a significant performance drop. 

I see that you introduced _hasLockCollisions_ method, but IMHO it gives a lot 
of false-positive results, this method does not search for cycles in locks 
acquisition order. Also, it does not guarantee that system won't hang because 
there is a chance that the retry-lock-acquisition loop also will lead to 
deadlocks. 

So, I believe that the right way to handle deadlock and let the system continue 
to work is the following: we need to throw an exception when _tryLock_ timeouts 
and a user will see _CachePartialUpdateCheckedException_. __ This solution is 
almost developed in my PR and will be ready in a few days.

 

[~alex_pl] what do you think? 

 

 

 

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200107#comment-17200107
 ] 

Aleksey Plekhanov commented on IGNITE-12451:


[~maliev], we've faced this issue and I want to fix it fast.
Actually I have a simple working prototype that affects only putAll operation 
(single-entry operation not affected), without performance impact (I hope): 
https://github.com/apache/ignite/pull/8268 . 

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Mirza Aliev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200051#comment-17200051
 ] 

Mirza Aliev commented on IGNITE-12451:
--

[~alex_pl] yes, currently I'm working on the ticket, in a few days I'll prepare 
PR 

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2020-09-22 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200046#comment-17200046
 ] 

Aleksey Plekhanov commented on IGNITE-12451:


[~maliev], are you working on this ticket? Can I take it?

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.10
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2019-12-30 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005370#comment-17005370
 ] 

Ivan Pavlukhin commented on IGNITE-12451:
-

[~ivan.glukos], [~maliev], would be great to see some kind of reasoning about 
complexity of a chosen algorithm in a ticket description.

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2019-12-27 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004267#comment-17004267
 ] 

Ivan Rakov commented on IGNITE-12451:
-

[~Pavlukhin] ThreadMXBean#findDeadlockedThreads is too resource-heavy.
Reusing existing facilities is fine, e.g. Guava has its 
https://guava.dev/releases/22.0/api/docs/com/google/common/util/concurrent/CycleDetectingLockFactory.html.
 Perhaps we can borrow this implementation ;)

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2019-12-16 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997909#comment-16997909
 ] 

Ivan Pavlukhin commented on IGNITE-12451:
-

[~maliev], I would like to outline my main concerns here. When I saw a ticket 
related to deadlock detection for ReentrantLock and friends, my first thought 
was why already existing facilities cannot be used. Actually, I think that 
having a note in a ticket description explaining why a custom implementation is 
needed would be enough (and very good as well).

In general, a deadlock detection in any case will bring an additional overhead. 
And I have doubts that it can be inexpensive.

By the way, we can consider other strategies for deadlocks mitigation. E.g. it 
might be a deadlock _prevention_, it is simpler and perhaps fits current 
problem.

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2019-12-16 Thread Mirza Aliev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997239#comment-16997239
 ] 

Mirza Aliev commented on IGNITE-12451:
--

[~Pavlukhin] I'm assuming that these note from javadoc of 
{{java.lang.management.ThreadMXBean#findDeadlockedThreads}} might be the reason 
to not use such approach.
{noformat}
This method is designed for troubleshooting use, but not for synchronization 
control. It might be an expensive operation.{noformat}
What do you think [~Pavlukhin] [~ivan.glukos]?

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2019-12-13 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996196#comment-16996196
 ] 

Ivan Pavlukhin commented on IGNITE-12451:
-

[~ivan.glukos], can {{java.lang.management.ThreadMXBean#findDeadlockedThreads}} 
help here? If not, why?

> Introduce deadlock detection for cache entry reentrant locks
> 
>
> Key: IGNITE-12451
> URL: https://issues.apache.org/jira/browse/IGNITE-12451
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>
> Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
> deadlock in case of careless usage of JCache mass operations (putAll, 
> removeAll):
> 1. If two different user threads will perform putAll on the same two keys in 
> reverse order (primary node for which is the same), there's a chance that 
> sys-stripe threads will be deadlocked.
> 2. Even without direct contract violation from user side, HashMap can be 
> passed as argument for putAll. Even if user threads have called mass 
> operations with two keys in the same order, HashMap iteration order is not 
> strictly defined, which may cause the same deadlock. 
> Local deadlock detection should mitigate this issue. We can create a wrapper 
> for ReentrantLock with logic that performs cycle detection in wait-for graph 
> in case we are waiting for lock acquisition for too long. Exception will be 
> thrown from one of the threads in such case, failing user operation, but 
> letting the system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)