Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )
Change subject: KUDU-2612: acquire and release partition lock ...................................................................... Patch Set 4: (11 comments) Addressed partial comments, will try to address the rest later. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530 PS3, Line 530: // For example, Status::IllegalState() originated from : // TabletServerErrorPB::TXN_ILLEGAL_STATE response and : > +1 No, this is not about TXN_LOCKED_RETRY. It is about error handling of TxnOpDispatcher. IIUC, TxnOpDispatcher currently only wraps the status of the call but not the error code. So TXN_LOCKED_ABORT is retried even it should not be (and handled by L510) for transactional ones. But for transactional ones it is taking care by L510. For TXN_LOCKED_RETRY_OP we return it with ServiceUnavailable Status. so now it is handled in L473. But added the specific checking there. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: TEST_F(TxnParticipantITest, TestTxnSystemClientBeginTxnLockAbort) { > Do you mind adding a scenario with calling ParticipateInTransaction() for t Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@833 PS3, Line 833: s.IsAborted > Hmm, interesting: what actor is to abort the transaction? I'm curious abou Sure, will add a comment. Now TxnManager will just pass the error back to the client, which will cause the op to fail. And the client need to figure out what to do with the error (instead of TxnManager silently abort the transaction). Or do you have other suggestions? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: FLAGS_enable_txn_partition_lock = false; > It would be great if you could add a note explaining that this scenario bec Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc@1046 PS3, Line 1046: > nit: for completeness sake, maybe commit `second_txn` and ensure we no long Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc@123 PS3, Line 123: } > Probably, we want to distinguish in the trace whether this is was a no-op o Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@67 PS3, Line 67: DEFINE_int32(tablet_inject_latency_on_apply_write_op_ms, 0, : "How much latency to inject when a write op is applied. " : "For testing only!"); : TAG_FLAG(tablet_inject_latency_on_apply_write_op_ms, un > Could we just reuse --enable_txn_partition_lock? When might we want to set Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@569 PS3, Line 569: } > LOG(DFATAL) or LOG(FATAL)? Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@166 PS3, Line 166: return : // 'TXN_LOCKED_ABORT' or 'TXN_LOCKED_RETRY_OP' error > How is it to return those if the return type of this method is Status? Ah, sorry for not being precise. Will update the comment. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h@220 PS3, Line 220: partition_lock_ = std::move(partition_lock); > nit: maybe DCHECK or CHECK that we own the lock? Done http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc File src/kudu/transactions/participant_rpc.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc@180 PS3, Line 180: case TabletServerErrorPB::TXN_ILLEGAL_STATE > Can we also explicitly handle TXN_LOCKED_RETRY_OP? Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 4 Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 08:02:59 +0000 Gerrit-HasComments: Yes