Re: Deadlock analysis

Stephen Darlington Thu, 15 Sep 2022 03:34:53 -0700

Ignite locks rows not caches, so no, that would not cause a deadlock.

This could cause a deadlock:


First:

#1 tx.start();
#2 cacheA.put(1, 1);
#3 cacheB.put(2, 2);
#4 tx.commit();

Second:

#5 tx.start();
#6 cacheB.put(2, 2);
#7 cacheA.put(1, 1);
#8 tx.commit();

Both these operations can be executing in parallel. The first transaction 
starts. Locks record 1 in Cache A. Meanwhile, the second starts and locks 
record 2 in cache B.

Now, the first transaction tries to lock record 2 in cache B, but it can’t 
because it’s locked by the second transaction.

The second transaction tries to lock record 1 in cache A, but it can’t because 
it’s locked by the first transaction.

The solution depends on your use case. You can lock your records in a 
predictable order. Or you can switch to using optimistic locking (in which 
case, one of your transactions will fail on the commit.)

> On 15 Sep 2022, at 10:56, Thomas Kramer <[email protected]> wrote:
> 
> Modifying previous example. Would this still potentially result in deadlock?
> 
> First:
> 
> #1 tx.start();
> #2 cacheA.put(1, 1);
> #3 cacheB.put(2, 2);
> #4 tx.commit();
> 
> Second:
> 
> #5 tx.start();
> #6 cacheB.put(1, 1);
> #7 cacheA.put(2, 2);
> #8 tx.commit();
> 
> Ignite locks cacheA on line #2 in first thread. In parallel second thread 
> blocks cacheB on line #6 and then has to wait on line #7 for blocked cacheA. 
> At the same time first thread must wait on line #3 second thread has already 
> locked cacheB in the meantime. So both threads can't continue. Is that 
> understanding correct?
> 
> Do the keys matter in this scenario for the deadlock or will the cache be 
> locked on any key value?
> 
> Thanks!
> 
> 
> On 15.09.22 11:36, Stephen Darlington wrote:
>> The important part is that they’re both waiting for each other to complete. 
>> Whether it’s one cache or ten is not significant.
>> 
>>> On 14 Sep 2022, at 12:44, Thomas Kramer <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> OK, that makes sense. However, in my logs below the deadlock says it's in 
>>> two different caches. How does this work?
>>> 
>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
>>> 
>>> 
>>> On 14.09.22 11:59, Николай Ижиков wrote:
>>>> Basically, deadlock looks like the following:
>>>> 
>>>> First:
>>>> 
>>>> tx.start();
>>>> 
>>>> cache.put(1, 1);
>>>> cache.put(2, 2);
>>>> 
>>>> tx.commit();
>>>> 
>>>> Second:
>>>> 
>>>> tx.start();
>>>> 
>>>> cache.put(2, 2);
>>>> cache.put(1, 1);
>>>> 
>>>> tx.commit();
>>>> 
>>>> So if «first» locks key=1 and «second» locks key=2 concurrently both 
>>>> process hangs trying to lock key=2(key=1) respectively. 
>>>> 
>>>>> 14 сент. 2022 г., в 12:39, Thomas Kramer <[email protected] 
>>>>> <mailto:[email protected]>> написал(а):
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> does the group here have any suggestion on this? I'm trying to find the 
>>>>> root of the deadlock we're getting on the production servers from time to 
>>>>> time.
>>>>> 
>>>>> So I'm trying to better understand why this can happen, and maybe looking 
>>>>> for sample code how to demonstrate such scenario in order to better 
>>>>> understand.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05.09.22 22:48, Thomas Kramer wrote:
>>>>>> I'm experiencing a transaction deadlock and would like to understand how 
>>>>>> to find out the cause of it.
>>>>>> 
>>>>>> Snipped from the log I get:
>>>>>> Deadlock detected:
>>>>>> 
>>>>>> K1: TX1 holds lock, TX2 waits lock.
>>>>>> K2: TX2 holds lock, TX1 waits lock.
>>>>>> 
>>>>>> Transactions:
>>>>>> 
>>>>>> TX1 [txId=GridCacheVersion [topVer=273263429, order=1661784224309, 
>>>>>> nodeOrder=4, dataCenterId=0], 
>>>>>> nodeId=8841e579-43b5-4c23-a690-1208bdd34d8c, threadId=30]
>>>>>> TX2 [txId=GridCacheVersion [topVer=273263429, order=1661784224257, 
>>>>>> nodeOrder=14, dataCenterId=0], 
>>>>>> nodeId=f08415e4-0ae7-45cd-aeca-2033267e92c3, threadId=3815]
>>>>>> 
>>>>>> Keys:
>>>>>> 
>>>>>> K1 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=TaskList]
>>>>>> K2 [key=e9228c01-b17e-49a5-bc7f-14c1541d9916, cache=MediaSets]
>>>>>> 
>>>>>> I can see that the same key (e9228c1) is used in a transaction on two 
>>>>>> different nodes.
>>>>>> 
>>>>>> Ignite documentation says: "One major rule that you must follow when 
>>>>>> working with distributed transactions is that locks for the keys 
>>>>>> participating in a transaction must be acquired in the same order. 
>>>>>> Violating this rule can lead to a distributed deadlock."
>>>>>> 
>>>>>> If the order of keys in the transaction must be in the same order, how 
>>>>>> can the same key cause a deadlock here? Is it because it's in two 
>>>>>> different caches? Maybe I don't fully understand how the transaction 
>>>>>> lock works.
>>>>>> 
>>>>>> Is there a code sample that demonstrates a potential violation? How can 
>>>>>> I now try to find in my source code where the issue happens on both 
>>>>>> nodes?
>>>>>> 
>>>>>> Thanks,
>>>>>> Thomas.
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>

Re: Deadlock analysis

Reply via email to