Sergey Korotkov created IGNITE-17457:
----------------------------------------

             Summary: Cluster locks after the transaction recovery procedure if 
the tx primary node fail
                 Key: IGNITE-17457
                 URL: https://issues.apache.org/jira/browse/IGNITE-17457
             Project: Ignite
          Issue Type: Bug
            Reporter: Sergey Korotkov


Ignite cluster may be locked (all client operations would block) after the tx 
recovery procedure executed on the tx primary node failure.

The prepared transaction may remain un-commited on the backup node after the tx 
recovery.  So the partition exchange wouldn't complete. So cluster would be 
locked.

The Immediate reason is the race condition in the method:
{code:java}
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
It may be called concurrently for the same transaction both from the recovery 
procedure:
{code:java}
IgniteTxManager::commitIfPrepared{code}
and from the tx recovery request handler:
{code:java}
IgniteTxHandler::processCheckPreparedTxRequest{code}
 

Details and reproducer {color:#ff0000}TBD{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to