[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into write ops for transactional and non-transactional operations. Upon attempting to write, we try to acquire the partition lock in the WriteOp prepare phase, and upon successfully applying the op, we transfer ownership of the lock to the Txn that the write was a part of (or just release the partition lock if the write was non-transactional). Txns release the lock when FINALIZE_COMMIT or ABORT_TXN is applied. We take the partition lock for non-transactional write ops as well to ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and then a non-transactional write inserts key=1 before the transaction commits). If the partition lock cannot be acquired, the write op is aborted or retried, based on the wait-die mechanics already in the lock manager. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Reviewed-on: http://gerrit.cloudera.org:8080/17159 Tested-by: Andrew Wong Reviewed-by: Alexey Serbin --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/lock_manager.h M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 600 insertions(+), 16 deletions(-) Approvals: Andrew Wong: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 12 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 14 Apr 2021 01:32:40 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 11: Verified+1 Test failure is unrelated: MiniRangerTest.TestGrantPrivilege Seems to be an issue with starting up the mini cluster. -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 14 Apr 2021 00:19:59 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has removed a vote on this change. Change subject: KUDU-2612: acquire and release partition lock .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574 PS9, Line 574: CHECK(!s.ok()) << s.ToString(); : completio > I guess DCHECK() wouldn't trigger in case of s.ok() because there is LOG(DF Ah that's true. I'll change this to a CHECK to be a bit more conservative, and given it's easily verifiable today from this code snippet at least. -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 13 Apr 2021 23:28:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#11) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into write ops for transactional and non-transactional operations. Upon attempting to write, we try to acquire the partition lock in the WriteOp prepare phase, and upon successfully applying the op, we transfer ownership of the lock to the Txn that the write was a part of (or just release the partition lock if the write was non-transactional). Txns release the lock when FINALIZE_COMMIT or ABORT_TXN is applied. We take the partition lock for non-transactional write ops as well to ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and then a non-transactional write inserts key=1 before the transaction commits). If the partition lock cannot be acquired, the write op is aborted or retried, based on the wait-die mechanics already in the lock manager. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/lock_manager.h M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 600 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/11 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 10: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574 PS9, Line 574: DCHECK(!s.ok()) << s.ToString(); : completio > Done as a DCHECK I guess DCHECK() wouldn't trigger in case of s.ok() because there is LOG(DFATAL) a couple of lines above, but if you are sure there aren't different code paths in release vs debug mode, it's totally fine with me. http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@115 PS9, Line 115: It is thus expected that repeat callers are taking : // the same lock > Done Thank you! -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 10 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 13 Apr 2021 23:20:00 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc@352 PS9, Line 352: error.IsTimedOut() > nit: could you add a comment explaining why both IsAborted() and IsTimedOut Done http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h File src/kudu/tablet/ops/write_op.h: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h@179 PS9, Line 179: locks > nit: lock? Done http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@217 PS9, Line 217: rows > nit: row Done http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574 PS9, Line 574: completion_callback()->set_error(s, code); : return s; > nit: does it make sense to add CHECK(!s.ok()) before setting the error code Done as a DCHECK http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc@525 PS9, Line 525: Status s = Write(0); : ASSERT_TRUE(s.IsAborted()) << s.ToString(); : ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another"); : : s = Write(0, kTxnTwo); : ASSERT_TRUE(s.IsAborted()) << s.ToString(); : ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another"); > nit: is it possible to turn this into a functor and reuse it in the code of Done http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@115 PS9, Line 115: It is thus expected that repeat callers are taking : // the same lock > nit: is it possible to check for this invariant in the implementation of th Done http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@117 PS9, Line 117: AcquirePartitionLock > nit: would AdoptPartitionLock() be a better name for this method? Done http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc File src/kudu/tablet/txn_participant.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc@78 PS9, Line 78: TabletServerErrorPB::Code code = tserver::TabletServerErrorPB::UNKNOWN_ERROR; > nit: maybe, move this under the 'if ()' clause below? Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 13 Apr 2021 23:03:28 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#10) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into write ops for transactional and non-transactional operations. Upon attempting to write, we try to acquire the partition lock in the WriteOp prepare phase, and upon successfully applying the op, we transfer ownership of the lock to the Txn that the write was a part of (or just release the partition lock if the write was non-transactional). Txns release the lock when FINALIZE_COMMIT or ABORT_TXN is applied. We take the partition lock for non-transactional write ops as well to ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and then a non-transactional write inserts key=1 before the transaction commits). If the partition lock cannot be acquired, the write op is aborted or retried, based on the wait-die mechanics already in the lock manager. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/lock_manager.h M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 600 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/10 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 10 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 9: Code-Review+2 (9 comments) Overall looks good! Just a few nits. http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc@352 PS9, Line 352: error.IsTimedOut() nit: could you add a comment explaining why both IsAborted() and IsTimedOut() are possible result codes here? http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h File src/kudu/tablet/ops/write_op.h: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h@179 PS9, Line 179: locks nit: lock? http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@217 PS9, Line 217: rows nit: row http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574 PS9, Line 574: completion_callback()->set_error(s, code); : return s; nit: does it make sense to add CHECK(!s.ok()) before setting the error code and returning? http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172 PS5, Line 172: // Acquire a shared lock on the given transaction, to ensure the : // transaction's state doesn't change while the given write is in flight. : Status AcquireTxnLock(int64_t txn_id, WriteOpState* op_state); > IIRC from my discussion with Hao, doing this in the BEGIN_TXN lets us catch Thank you very much for going an extra mile! I think it's crucial to shrink the lock time to have better throughput for concurrent operations. http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc@525 PS9, Line 525: Status s = Write(0); : ASSERT_TRUE(s.IsAborted()) << s.ToString(); : ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another"); : : s = Write(0, kTxnTwo); : ASSERT_TRUE(s.IsAborted()) << s.ToString(); : ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another"); nit: is it possible to turn this into a functor and reuse it in the code of this scenario? http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@115 PS9, Line 115: It is thus expected that repeat callers are taking : // the same lock nit: is it possible to check for this invariant in the implementation of this method? http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@117 PS9, Line 117: AcquirePartitionLock nit: would AdoptPartitionLock() be a better name for this method? http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc File src/kudu/tablet/txn_participant.cc: http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc@78 PS9, Line 78: TabletServerErrorPB::Code code = tserver::TabletServerErrorPB::UNKNOWN_ERROR; nit: maybe, move this under the 'if ()' clause below? -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 13 Apr 2021 05:59:24 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#9) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into write ops for transactional and non-transactional operations. Upon attempting to write, we try to acquire the partition lock in the WriteOp prepare phase, and upon successfully applying the op, we transfer ownership of the lock to the Txn that the write was a part of (or just release the partition lock if the write was non-transactional). Txns release the lock when FINALIZE_COMMIT or ABORT_TXN is applied. We take the partition lock for non-transactional write ops as well to ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and then a non-transactional write inserts key=1 before the transaction commits). If the partition lock cannot be acquired, the write op is aborted or retried, based on the wait-die mechanics already in the lock manager. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 15 files changed, 604 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/9 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: auto resp_error = StatusFromPB(resp.error().status()); > This is marked 'Done' but I don't see that a scenario with calling Particip Done as TxnParticipantITest.TestTxnSystemClientRepeatCalls http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: // TODO(awong): update this when impleme > nit: this is marked as 'Done', and I saw TODO is added in other places. Is Done http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc@282 PS5, Line 282: ASSERT_EQ(txns, replicas[i]->tablet()->txn_participant()->GetTxnsForTests()); : }); : } > nit: is this change essential? If not, maybe it's better to return back to Ah I added this because the list of transactions can be quite long, and a difference in count is a bit easier to decipher and think about than comparing the full lists that get printed out. I can revert it -- it was more useful for debugging and can be added back if we ever need to debug this further. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h File src/kudu/tablet/ops/participant_op.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h@75 PS3, Line 75: ors the given 'op_id > Can this method be invoked as a part of larger operation which might have a I'm going to punt on this. I think there'd be a decent amount of plumbing to get the client's deadline, and it'd also be for leaders only. I added a TODO instead. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc@212 PS5, Line 212: "START. Timestamp: $0", clock::HybridClock::GetPhysicalValueMicros(state_->times > How to release the partition lock if the following code kicks in when the d If replication fails, the ScopedPartitionLock is dropped, and if that was the last reference to the partition lock, the lock is released. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172 PS5, Line 172: // Acquire a shared lock on the given transaction, to ensure the : // transaction's state doesn't change while the given write is in flight. : Status AcquireTxnLock(int64_t txn_id, WriteOpState* op_state); > BTW, why do we need this method when we already have a similar one for Writ IIRC from my discussion with Hao, doing this in the BEGIN_TXN lets us catch races earlier. OTOH, thinking about it more, we ought to instead have the goal of taking the lock for as little time as possible, so I've gone and updated this so we only take the lock for writes. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@165 PS3, Line 165: If 'must_acquire' is true, : // then wait until the lock is acquired > Write operations are executed in the context of an RPC which have a timeout The op will proceed past the timeout. It might be avoidable, but I'm not sure it's a dealbreaker. At least, outside of the dispatcher, we don't have checks along the write path for timeouts to exit early, and at least in the context of the OpDrivers, I don't think it's worth the fuss for a single op, given we have similarly timed acquires on the write path already. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/txn_participant.cc File src/kudu/tablet/txn_participant.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/txn_participant.cc@41 PS5, Line 41: namespace kudu { > warning: using decl 'Substitute' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#8) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into write ops for transactional and non-transactional operations. Upon attempting to write, we try to acquire the partition lock in the WriteOp prepare phase, and upon successfully applying the op, we transfer ownership of the lock to the Txn that the write was a part of (or just release the partition lock if the write was non-transactional). Txns release the lock when FINALIZE_COMMIT or ABORT_TXN is applied. We take the partition lock for non-transactional write ops as well to ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and then a non-transactional write inserts key=1 before the transaction commits). If the partition lock cannot be acquired, the write op is aborted or retried, based on the wait-die mechanics already in the lock manager. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/participant_op.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 604 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/8 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#7) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into write ops for transactional and non-transactional operations. Upon attempting to write, we try to acquire the partition lock in the WriteOp prepare phase, and upon successfully applying the op, we transfer ownership of the lock to the Txn that the write was a part of (or just release the partition lock if the write was non-transactional). Txns release the lock when FINALIZE_COMMIT or ABORT_TXN is applied. We take the partition lock for non-transactional write ops as well to ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and then a non-transactional write inserts key=1 before the transaction commits). If the partition lock cannot be acquired, the write op is aborted or retried, based on the wait-die mechanics already in the lock manager. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/participant_op.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 593 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/7 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#6) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into participant and write ops for transactional and non-transactional operations. For each transaction, we try to acquire the partition lock in the ParticipantOp prepare phase of BEGIN_TXN and release the lock when COMMIT_TXN or ABORT_TXN is applied. Moreover, we take the partition lock for non-transactional write ops as well to ensure we don’t have duplicate keys. If the partition lock cannot be acquired, the transaction (or write op) is aborted or retried. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/participant_op.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 552 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/6 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: } > Done This is marked 'Done' but I don't see that a scenario with calling ParticipateInTransaction(..., ParticipantOpPB::BEGIN_TXN) for the same txn_id multiple times is added. Did I miss anything? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: > Done nit: this is marked as 'Done', and I saw TODO is added in other places. Is it worth adding that comment here as well? http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc@282 PS5, Line 282: const auto& actual_txns = replicas[i]->tablet()->txn_participant()->GetTxnsForTests(); : ASSERT_EQ(txns.size(), actual_txns.size()); : ASSERT_EQ(txns, actual_txns); nit: is this change essential? If not, maybe it's better to return back to the version how it was in PS3? I guess with this version as in PS5, if the assertion ever triggers (even with ASSERT_EVENTUALLY), there isn't as much useful information anymore because the assertion on the size strikes first. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc@212 PS5, Line 212: RETURN_NOT_OK(replica->tablet()->AcquirePartitionLock(state_.get(), wait_mode)); How to release the partition lock if the following code kicks in when the driver run Status ParticipantOp::Prepare(): https://github.com/apache/kudu/blob/1eb2a2fbca329721ec7152714b7c95374754044b/src/kudu/tablet/ops/op_driver.cc#L298-L305 Basically, how to guarantee the lock is to be released in all possible outcomes in the OpDriver's code once the lock is taken? http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172 PS5, Line 172: // Similar to the above but for participant op. : Status AcquirePartitionLock(ParticipantOpState* op_state, : LockManager::LockWaitMode wait_mode); BTW, why do we need this method when we already have a similar one for WriteOpState? Would it be possible to acquire partition lock only when performing a write operation? If not, could you please add a blurb explaining why it's also necessary to acquire a lock while working with ParticipantOpState as well? -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Mar 2021 03:34:08 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has uploaded a new patch set (#5) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch plumbs partition locks into participant and write ops for transactional and non-transactional operations. For each transaction, we try to acquire the partition lock in the ParticipantOp prepare phase of BEGIN_TXN and release the lock when COMMIT_TXN or ABORT_TXN is applied. Moreover, we take the partition lock for non-transactional write ops as well to ensure we don’t have duplicate keys. If the partition lock cannot be acquired, the transaction (or write op) is aborted or retried. A flag is also introduced to disable this locking for tests that currently expect support for concurrent transactions. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/participant_op.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 16 files changed, 471 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/5 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 4: (11 comments) Addressed partial comments, will try to address the rest later. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530 PS3, Line 530: //For example, Status::IllegalState() originated from : //TabletServerErrorPB::TXN_ILLEGAL_STATE response and : > +1 No, this is not about TXN_LOCKED_RETRY. It is about error handling of TxnOpDispatcher. IIUC, TxnOpDispatcher currently only wraps the status of the call but not the error code. So TXN_LOCKED_ABORT is retried even it should not be (and handled by L510) for transactional ones. But for transactional ones it is taking care by L510. For TXN_LOCKED_RETRY_OP we return it with ServiceUnavailable Status. so now it is handled in L473. But added the specific checking there. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: TEST_F(TxnParticipantITest, TestTxnSystemClientBeginTxnLockAbort) { > Do you mind adding a scenario with calling ParticipateInTransaction() for t Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@833 PS3, Line 833: s.IsAborted > Hmm, interesting: what actor is to abort the transaction? I'm curious abou Sure, will add a comment. Now TxnManager will just pass the error back to the client, which will cause the op to fail. And the client need to figure out what to do with the error (instead of TxnManager silently abort the transaction). Or do you have other suggestions? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: FLAGS_enable_txn_partition_lock = false; > It would be great if you could add a note explaining that this scenario bec Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc@1046 PS3, Line 1046: > nit: for completeness sake, maybe commit `second_txn` and ensure we no long Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc@123 PS3, Line 123: } > Probably, we want to distinguish in the trace whether this is was a no-op o Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@67 PS3, Line 67: DEFINE_int32(tablet_inject_latency_on_apply_write_op_ms, 0, : "How much latency to inject when a write op is applied. " : "For testing only!"); : TAG_FLAG(tablet_inject_latency_on_apply_write_op_ms, un > Could we just reuse --enable_txn_partition_lock? When might we want to set Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@569 PS3, Line 569: } > LOG(DFATAL) or LOG(FATAL)? Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@166 PS3, Line 166: return : // 'TXN_LOCKED_ABORT' or 'TXN_LOCKED_RETRY_OP' error > How is it to return those if the return type of this method is Status? Ah, sorry for not being precise. Will update the comment. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h@220 PS3, Line 220: partition_lock_ = std::move(partition_lock); > nit: maybe DCHECK or CHECK that we own the lock? Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc File src/kudu/transactions/participant_rpc.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc@180 PS3, Line 180: case TabletServerErrorPB::TXN_ILLEGAL_STATE > Can we also explicitly handle TXN_LOCKED_RETRY_OP? Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id:
[kudu-CR] KUDU-2612: acquire and release partition lock
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17159 to look at the new patch set (#4). Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch introduces the logic for actual acquiring and releasing the partition lock for transactions and non-transactional operations. For each transaction, we try to acquire the partition lock in the ParticipantOp prepare phase of BEGIN_TXN and release the lock when COMMIT_TXN or ABORT_TXN is applied. Moreover, we take the parition lock for non-transactional operations as well to ensure we don’t have duplicate keys. If the partition lock cannot be acquired, the transaction (or write op) will be aborted or retried. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/participant_op.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 14 files changed, 465 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/4 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 4 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: acquire and release partition lock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@7 PS3, Line 7: acquire and release partition lock Do you plan to add an end-to-end test scenario into client.cc or alike to verify how this works if looking from the client side? http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@13 PS3, Line 13: parition partition http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@16 PS3, Line 16: will be aborted or retried Do you mind extending this a bit to explain what actor is to abort or retry corresponding transaction/write op? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530 PS3, Line 530: //Status::Abort() originated from TabletServerErrorPB::TXN_LOCKED_ABORT : //response are needlessly retried. : > Isn't this handled above? Or do you mean TXN_LOCKED_RETRY? We'd probably be +1 If that's about TXN_LOCKED_RETRY, I agree that it's better to handle TXN_LOCKED_RETRY as a specific error code than generic IsIllegalState() http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: TEST_F(TxnParticipantITest, TestTxnSystemClientBeginTxnLockAbort) { Do you mind adding a scenario with calling ParticipateInTransaction() for the same txn_id multiple times? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@833 PS3, Line 833: s.IsAborted Hmm, interesting: what actor is to abort the transaction? I'm curious about the bigger picture: I was under impression that it's the TxnManager who takes care of that, right? Could you add a comment clarifying that at this point kSecondTxn isn't aborted yet (well, it's not even started, actually, right)? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: FLAGS_enable_txn_partition_lock = false; It would be great if you could add a note explaining that this scenario became a fully synthetic one with the introduction of coarse-grain locking. It might be a TODO reminding to remove the override for the flag once fine-grained locking appears. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h File src/kudu/tablet/ops/participant_op.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h@75 PS3, Line 75: AcquirePartitionLock Can this method be invoked as a part of larger operation which might have a deadline? If so, should this method take an optional 'deadline' parameter as well? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc@123 PS3, Line 123: TRACE("Released partition lock"); Probably, we want to distinguish in the trace whether this is was a no-op or actual release of the lock, no? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@165 PS3, Line 165: If 'must_acquire' is true, : // then wait until the lock is acquired Write operations are executed in the context of an RPC which have a timeout. What happens if a lock is still being acquired but the operation has timed out? Could this end up in accumulating too many ops in the queue? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@166 PS3, Line 166: return : // 'TXN_LOCKED_ABORT' or 'TXN_LOCKED_RETRY_OP' error How is it to return those if the return type of this method is Status? -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 22:11:50 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530 PS3, Line 530: //Status::Abort() originated from TabletServerErrorPB::TXN_LOCKED_ABORT : //response are needlessly retried. : Isn't this handled above? Or do you mean TXN_LOCKED_RETRY? We'd probably be better off if we handled TXN_LOCKED_RETRY_OP explicitly and treated it as SERVICE_UNAVAILABLE or somesuch. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc@1046 PS3, Line 1046: } nit: for completeness sake, maybe commit `second_txn` and ensure we no longer have dispatchers? Same below? http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.h File src/kudu/tablet/ops/participant_op.h: http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.h@75 PS2, Line 75: lock_manag nit: the lock wasn't acquired by this op, it was acquired by txn_. http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.cc@108 PS2, Line 108: else { nit: use LOG(FATAL) or LOG(DFATAL). http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@67 PS3, Line 67: DEFINE_bool(enable_write_op_partition_lock, true, : "Whether or not to enable partition lock for write op"); : TAG_FLAG(enable_write_op_partition_lock, advanced); : TAG_FLAG(enable_write_op_partition_lock, experimental); Could we just reuse --enable_txn_partition_lock? When might we want to set them to different values? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@569 PS3, Line 569: DCHECK(false) << "unexpected error code " << code; LOG(DFATAL) or LOG(FATAL)? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h@220 PS3, Line 220: partition_lock_ = std::move(partition_lock); nit: maybe DCHECK or CHECK that we own the lock? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc File src/kudu/transactions/participant_rpc.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc@180 PS3, Line 180: case TabletServerErrorPB::TXN_LOCKED_ABORT: Can we also explicitly handle TXN_LOCKED_RETRY_OP? -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 20:58:26 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc@401 PS1, Line 401: } > For consistency, maybe add a check that other != this ? Will update it in the other CR. -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 18:57:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG@13 PS1, Line 13: COMMI > We'll also need to make sure the TxnOpDispatcher handles the error cases we Done http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@95 PS1, Line 95: l acquired = partition_lock_.IsAcquired(); > nit: based on the current code, this cannot be any other code, right? If so Done http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@199 PS1, Line 199: "ParticipantOp::Prepare > nit: can use 'replica' Done http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@200 PS1, Line 200: TRACE("PREPARE: Starting."); > Shouldn't there be a 'break' before this line? Or are we taking the partiti Ah, right, 'break' is needed. Thanks for catching it! http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@300 PS1, Line 300: return Status::InvalidArgument("unknown op type"); > As for the sequence of releasing txn and its lock and partition lock: shoul Yes, it makes sense. Updated it. http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@303 PS1, Line 303: return Status::OK(); > nit: could store the type of the operation instead since the operation as i Will do. http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/tablet.cc@223 PS1, Line 223: using std::endl; > warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 18:57:01 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17159 to look at the new patch set (#3). Change subject: KUDU-2612: acquire and release partition lock .. KUDU-2612: acquire and release partition lock This patch introduces the logic for actual acquiring and releasing the partition lock for transactions and non-transactional operations. For each transaction, we try to acquire the partition lock in the ParticipantOp prepare phase of BEGIN_TXN and release the lock when COMMIT_TXN or ABORT_TXN is applied. Moreover, we take the parition lock for non-transactional operations as well to ensure we don’t have duplicate keys. If the partition lock cannot be acquired, the transaction (or write op) will be aborted or retried. Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/txn_participant-itest.cc M src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/ops/participant_op.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/ops/write_op.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.h M src/kudu/transactions/participant_rpc.cc 12 files changed, 376 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/3 -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)