Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 )
Change subject: KUDU-2612 tablet servers automatically register txn participants ...................................................................... Patch Set 9: (14 comments) http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS9: > I think it's better to move these new test scenarios into txn_write_ops-ite Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: PS9: > Could you also add a test in which we delete a replica that has pending ops Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc@446 PS9, Line 446: GetSingleTxnOpDispatcher() { > nit: indent by 4 spaces Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc@197 PS9, Line 197: RETURN_NOT_OK(tablet_replica_->UnregisterTxnOpDispatcher(txn_id())); > Apply() cannot fail, see L234 for the CHECK_OK(). I suspect we should be ab Exactly -- as we discussed offline, the proper place for this check is ParticipantOp::Prepare() Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h@443 PS9, Line 443: let > nit: give Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602 PS9, Line 602: this->RegisteredParticipantCb(txn_id, s); > Answering my own question, no I don't think it does. That said, we could pr Ack. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602 PS9, Line 602: this->RegisteredParticipantCb(txn_id, s); > This gets set here and in the ParticipantTxnBeginCallback. Does that mean i It's being called once, as far as I can see. Anyway, it was messy due to iterative approach (I tried several options); I cleaned this up a bit. Should be more straightforward now. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@607 PS9, Line 607: SetRegistrationScheduled > SetPreliminaryTasksScheduled()? Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1179 PS9, Line 1179: TxnParticipantBegi > nit: BeginTxnParticipantOp? Same in the callback? Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1208 PS9, Line 1208: CancelPendingTxnOps > Is it possible for CancelPendingTxnOps to be called twice if BEGIN_TXN op f Nope, it's called only once as I can see. Even if it was called twice, the second time would be a no-op because the queue of pending ops was empty. Anyways, I cleaned it up bit -- it should be more straightforward now. I agree it was a bit messy in this revision. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1260 PS9, Line 1260: preliminary_tasks_scheduled_ > If the main purpose of this is to ensure we don't have multiple registratio It would if it were guaranteed that the caller would (1) always try to schedule those tasks (2) always succeed in registering tasks if it tries to do so (1) isn't guaranteed if looking at that from API perspective. (2) isn't guaranteed because TabletReplica::SubmitTxnWrite() can fail with non-OK status. I agree that the current approach doesn't look particularly good but telying on the logic of existing call sites would make even worse. But I might be missing the essence of your question. Probably you meant just to call the scheduler functor here and remove that call-back from the upper level to set 'preliminary_tasks_scheduled_'? I indeed removed the extra level of indirection, and now the scheduler function is called right from here. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1362 PS9, Line 1362: Cleanup > nit: maybe call this RespondFailures() or something to make it more obvious As for the holding the lock, I don't think it's needed here because this method isn't supposed to be called while inflight_status_ is to change concurrently. But it's a good call: it looks suspicious. To make it cleaner, I made this method static. Renamed into RespondWithStatus(), added DCHECK() for the status parameter at the call sites. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1166 PS9, Line 1166: : if (PREDICT_FALSE(FLAGS_txn_participant_begin_op_inject_latency_ms > 0)) { : // This is to simulate a situation when txn participant registration : // above takes too long. : SleepFor(MonoDelta::FromMilliseconds( : FLAGS_txn_participant_begin_op_inject_latency_ms)); : } > nit: can use MAYBE_INJECT_FIXED_LATENCY(), same above. Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1180 PS9, Line 1180: d); > Would it make sense to pass 'cb' as an argument here to pass to Participant Good call! Yep, it makes sense. -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 9 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 27 Feb 2021 06:20:14 +0000 Gerrit-HasComments: Yes
