subject:"\[jira\] \[Comment Edited\] \(IGNITE\-20995\) Add more integration tests for tx recovery on unstable topology"

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-15 Thread Denis Chudov (Jira)

[
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796781#comment-17796781
]

Denis Chudov edited comment on IGNITE-20995 at 12/15/23 9:52 AM:
-

>From my point of view, we should have following scenarios:
# Lock conflict between two RW transactions, coordinator is lost for lock
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
# Lock conflict between two RW transactions, coordinator is alive, recovery is
not started, lock waiter is not affected.
# Resolution of write intent belonging to tx without coordinator, recovery
happens successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, commit partition has
restarted and lost its local volatile tx state map, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to pending transaction, coordinator is
alive, recovery is not started.
# RO transaction tx0 resolves write intent belonging to the transaction tx1
and marks it as abandoned and starts the recovery; after that RW transaction
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and
doesn't start the recovery. Recovery is triggered just once.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Commit is successful, tx recovery was not able to change
transaction state, there are no assertions or other errors, write intents on
data node are switched.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Recovery successfully aborts the transaction, the is correct
exception on the coordinator, it was not able to change transaction state to
commit, there are no assertions or other errors, write intents on data node are
switched.
# Parallel tx recoveries happen on two replicas of commit partition on
different nodes, both processes were started at a moment when the corresponding
replica was the primary one. This also shouldnt break anything.
# There are two parallel recoveries on commit partition initiated by tx state
requests and a commit process initiated by coordinator that is already dead.
Transaction is committed. Both recovery processes get correct commit timestamp
and end up with correct write intent resolution (one recovery process receives
commit timestamp from exception thrown by raft client, another should receive
it from the completed finish future of local FINISHED state).

was (Author: denis chudov):
>From my point of view, we should have following scenarios:
# Lock conflict between two RW transactions, coordinator is lost for lock
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
# Lock conflict between two RW transactions, coordinator is alive, recovery is
not started, lock waiter is not affected.
# Resolution of write intent belonging to tx without coordinator, recovery
happens successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, commit partition has
restarted and lost its local volatile tx state map, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to pending transaction, coordinator is
alive, recovery is not started.
# RO transaction tx0 resolves write intent belonging to the transaction tx1
and marks it as abandoned and starts the recovery; after that RW transaction
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and
doesn't start the recovery. Recovery is triggered just once.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Commit is successful, tx recovery was not able to change
transaction state, there are no assertions or other errors, write intents on
data node are switched.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Recovery successfully aborts the transaction, the is correct
exception on the coordinator, it was not able to change transaction state to
commit, there are no assertions or other errors, write intents on data node are
switched.
# Parallel tx recoveries happen on two replicas of commit partition, both
processes were started at a moment when the corresponding replica was the
primary one. This also shouldnt break anything.
# There

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-15 Thread Denis Chudov (Jira)

[
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796781#comment-17796781
]

Denis Chudov edited comment on IGNITE-20995 at 12/15/23 9:51 AM:
-

>From my point of view, we should have following scenarios:
# Lock conflict between two RW transactions, coordinator is lost for lock
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
# Lock conflict between two RW transactions, coordinator is alive, recovery is
not started, lock waiter is not affected.
# Resolution of write intent belonging to tx without coordinator, recovery
happens successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, commit partition has
restarted and lost its local volatile tx state map, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to pending transaction, coordinator is
alive, recovery is not started.
# RO transaction tx0 resolves write intent belonging to the transaction tx1
and marks it as abandoned and starts the recovery; after that RW transaction
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and
doesn't start the recovery. Recovery is triggered just once.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Commit is successful, tx recovery was not able to change
transaction state, there are no assertions or other errors, write intents on
data node are switched.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Recovery successfully aborts the transaction, the is correct
exception on the coordinator, it was not able to change transaction state to
commit, there are no assertions or other errors, write intents on data node are
switched.
# Parallel tx recoveries happen on two replicas of commit partition, both
processes were started at a moment when the corresponding replica was the
primary one. This also shouldnt break anything.
# There are two parallel recoveries on commit partition initiated by tx state
requests and a commit process initiated by coordinator that is already dead.
Transaction is committed. Both recovery processes get correct commit timestamp
and end up with correct write intent resolution (one recovery process receives
commit timestamp from exception thrown by raft client, another should receive
it from the completed finish future of local FINISHED state).

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-15 Thread Denis Chudov (Jira)

[
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796781#comment-17796781
]

Denis Chudov edited comment on IGNITE-20995 at 12/15/23 9:35 AM:
-

>From my point of view, we should have following scenarios:
# Lock conflict between two RW transactions, coordinator is lost for lock
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
# Lock conflict between two RW transactions, coordinator is alive, recovery is
not started, lock waiter is not affected.
# Resolution of write intent belonging to tx without coordinator, recovery
happens successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, commit partition has
restarted and lost its local volatile tx state map, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to pending transaction, coordinator is
alive, recovery is not started.
# RO transaction tx0 resolves write intent belonging to the transaction tx1
and marks it as abandoned and starts the recovery; after that RW transaction
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and
doesn't start the recovery. Recovery is triggered just once.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Commit is successful, tx recovery was not able to change
transaction state, there are no assertions or other errors, write intents on
data node are switched.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Recovery successfully aborts the transaction, the is correct
exception on the coordinator, it was not able to change transaction state to
commit, there are no assertions or other errors, write intents on data node are
switched.
# Parallel tx recoveries happen on two replicas of commit partition, both
processes were started at a moment when the corresponding replica was the
primary one. This also shouldnt break anything.
# There are two parallel recoveries on commit partition initiated by tx state
requests and a commit process initiated by coordinator that is already dead.
Transaction is committed. Both recovery processes get correct commit timestamp
and end up with correct write intent resolution.

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-15 Thread Denis Chudov (Jira)



[ 
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796781#comment-17796781
 ] 

Denis Chudov edited comment on IGNITE-20995 at 12/15/23 9:33 AM:
-

>From my point of view, we should have following scenarios:
 # Lock conflict between two RW transactions, coordinator is lost for lock 
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
 # Lock conflict between two RW transactions, coordinator is alive, recovery is 
not started, lock waiter is not affected.
 # Resolution of write intent belonging to tx without coordinator, recovery 
happens successfully, write intent is switched.
 # Resolution of write intent belonging to abandoned tx, recovery happens 
successfully, write intent is switched.
 # Resolution of write intent belonging to abandoned tx, commit partition has 
restarted and lost its local volatile tx state map, recovery happens 
successfully, write intent is switched.
 # Resolution of write intent belonging to pending transaction, coordinator is 
alive, recovery is not started.
 # RO transaction tx0 resolves write intent belonging to the transaction tx1 
and marks it as abandoned and starts the recovery; after that RW transaction 
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and 
doesn't start the recovery. Recovery is triggered just once.
 # Coordinator is lost, but it has sent the commit message to a commit 
partition, in the same time the recovery initiating request is received from 
some data node. Commit is successful, tx recovery was not able to change 
transaction state, there are no assertions or other errors, write intents on 
data node are switched.
 # Coordinator is lost, but it has sent the commit message to a commit 
partition, in the same time the recovery initiating request is received from 
some data node. Recovery successfully aborts the transaction, the is correct 
exception on the coordinator, it was not able to change transaction state to 
commit, there are no assertions or other errors, write intents on data node are 
switched.
 # Parallel tx recoveries happen on two replicas of commit partition, both 
processes were started at a moment when the corresponding replica was the 
primary one. This also shouldnt break anything.
 # There are two parallel recoveries on commit partition and a commit process 
initiated by coordinator that is already dead. Both recovery processes get 
correct commit timestamp and resolve write intent correctly.


was (Author: denis chudov):
>From my point of view, we should have following scenarios:
 # Lock conflict between two RW transactions, coordinator is lost for lock 
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
 # Lock conflict between two RW transactions, coordinator is alive, recovery is 
not started, lock waiter is not affected.
 # Resolution of write intent belonging to tx without coordinator, recovery 
happens successfully, write intent is switched.
 # Resolution of write intent belonging to abandoned tx, recovery happens 
successfully, write intent is switched.
 # Resolution of write intent belonging to abandoned tx, commit partition has 
restarted and lost its local volatile tx state map, recovery happens 
successfully, write intent is switched.
 # Resolution of write intent belonging to pending transaction, coordinator is 
alive, recovery is not started.
 # RO transaction tx0 resolves write intent belonging to the transaction tx1 
and marks it as abandoned and starts the recovery; after that RW transaction 
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and 
doesn't start the recovery. Recovery is triggered just once.
 # Coordinator is lost, but it has sent the commit message to a commit 
partition, in the same time the recovery initiating request is received from 
some data node. Commit is successful, tx recovery was not able to change 
transaction state, there are no assertions or other errors, write intents on 
data node are switched.
 # Coordinator is lost, but it has sent the commit message to a commit 
partition, in the same time the recovery initiating request is received from 
some data node. Recovery successfully aborts the transaction, the is correct 
exception on the coordinator, it was not able to change transaction state to 
commit, there are no assertions or other errors, write intents on data node are 
switched.
 # Parallel tx recoveries happen on two replicas of commit partition, both 
processes were started at a moment when the corresponding replica was the 
primary one. This also shouldnt break anything.

> Add more integration tests for tx recovery on unstable topology
> ---
>
> Key: IGNITE-20995
> URL: https://issues.apache.org/jira/browse/IGNITE-20995
> Project:

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-15 Thread Denis Chudov (Jira)

[
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796781#comment-17796781
]

Denis Chudov edited comment on IGNITE-20995 at 12/15/23 9:35 AM:
-

>From my point of view, we should have following scenarios:
# Lock conflict between two RW transactions, coordinator is lost for lock
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
# Lock conflict between two RW transactions, coordinator is alive, recovery is
not started, lock waiter is not affected.
# Resolution of write intent belonging to tx without coordinator, recovery
happens successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, commit partition has
restarted and lost its local volatile tx state map, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to pending transaction, coordinator is
alive, recovery is not started.
# RO transaction tx0 resolves write intent belonging to the transaction tx1
and marks it as abandoned and starts the recovery; after that RW transaction
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and
doesn't start the recovery. Recovery is triggered just once.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Commit is successful, tx recovery was not able to change
transaction state, there are no assertions or other errors, write intents on
data node are switched.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Recovery successfully aborts the transaction, the is correct
exception on the coordinator, it was not able to change transaction state to
commit, there are no assertions or other errors, write intents on data node are
switched.
# Parallel tx recoveries happen on two replicas of commit partition, both
processes were started at a moment when the corresponding replica was the
primary one. This also shouldnt break anything.
# There are two parallel recoveries on commit partition initiated by tx state
requests and a commit process initiated by coordinator that is already dead.
Both recovery processes get correct commit timestamp and end up with correct
write intent resolution.

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-15 Thread Denis Chudov (Jira)

[
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796781#comment-17796781
]

Denis Chudov edited comment on IGNITE-20995 at 12/15/23 9:34 AM:
-

>From my point of view, we should have following scenarios:
# Lock conflict between two RW transactions, coordinator is lost for lock
holder, recovery starts, lock holder is aborted, lock waiter is not affected.
# Lock conflict between two RW transactions, coordinator is alive, recovery is
not started, lock waiter is not affected.
# Resolution of write intent belonging to tx without coordinator, recovery
happens successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to abandoned tx, commit partition has
restarted and lost its local volatile tx state map, recovery happens
successfully, write intent is switched.
# Resolution of write intent belonging to pending transaction, coordinator is
alive, recovery is not started.
# RO transaction tx0 resolves write intent belonging to the transaction tx1
and marks it as abandoned and starts the recovery; after that RW transaction
tx2 meets the lock belonging to tx1, sees that it's abandoned recently and
doesn't start the recovery. Recovery is triggered just once.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Commit is successful, tx recovery was not able to change
transaction state, there are no assertions or other errors, write intents on
data node are switched.
# Coordinator is lost, but it has sent the commit message to a commit
partition, in the same time the recovery initiating request is received from
some data node. Recovery successfully aborts the transaction, the is correct
exception on the coordinator, it was not able to change transaction state to
commit, there are no assertions or other errors, write intents on data node are
switched.
# Parallel tx recoveries happen on two replicas of commit partition, both
processes were started at a moment when the corresponding replica was the
primary one. This also shouldnt break anything.
# There are two parallel recoveries on commit partition initiated by tx state
request and a commit process initiated by coordinator that is already dead.
Both recovery processes get correct commit timestamp and end up with correct
write intent resolution.

> Add more

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

[jira] [Comment Edited] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

6 matches

Site Navigation

Mail list logo

Footer information