Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 )
Change subject: KUDU-2612 tablet servers automatically register txn participants ...................................................................... Patch Set 9: (10 comments) http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: PS9: Could you also add a test in which we delete a replica that has pending ops? There are some code paths that probably aren't exercised in the dispatcher around CheckRunning(). http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc@446 PS9, Line 446: GetSingleTxnOpDispatcher() { nit: indent by 4 spaces http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h@443 PS9, Line 443: let nit: give http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602 PS9, Line 602: this->RegisteredParticipantCb(txn_id, s); This gets set here and in the ParticipantTxnBeginCallback. Does that mean it gets called twice? http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@607 PS9, Line 607: SetRegistrationScheduled SetPreliminaryTasksScheduled()? http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1179 PS9, Line 1179: TxnParticipantBegi nit: BeginTxnParticipantOp? Same in the callback? http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1260 PS9, Line 1260: preliminary_tasks_scheduled_ If the main purpose of this is to ensure we don't have multiple registrations going at once, would it make sense to set 'preliminary_tasks_scheduled_' to true here? Or maybe call it 'preliminary_tasks_being_scheduled_' or somesuch. Would that help avoid the race you mention above? http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1362 PS9, Line 1362: Cleanup nit: maybe call this RespondFailures() or something to make it more obvious that this should only be called on failure? Also, do we need to hold the lock when checking the ops_queue_ and inflight_status_? http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1166 PS9, Line 1166: : if (PREDICT_FALSE(FLAGS_txn_participant_begin_op_inject_latency_ms > 0)) { : // This is to simulate a situation when txn participant registration : // above takes too long. : SleepFor(MonoDelta::FromMilliseconds( : FLAGS_txn_participant_begin_op_inject_latency_ms)); : } nit: can use MAYBE_INJECT_FIXED_LATENCY(), same above. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1180 PS9, Line 1180: d); Would it make sense to pass 'cb' as an argument here to pass to ParticipantTxnBeginCallback's constructor? AFAICT they're identical lambdas anyway. -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 9 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 23 Feb 2021 22:27:38 +0000 Gerrit-HasComments: Yes
