Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )
Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only ...................................................................... Patch Set 9: (7 comments) IWYU isn't happy yet: >>> Fixing #includes in >>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc' @@ -32,6 +32,7 @@ #include "kudu/common/wire_protocol.h" #include "kudu/consensus/metadata.pb.h" #include "kudu/consensus/raft_consensus.h" +#include "kudu/gutil/casts.h" #include "kudu/gutil/macros.h" #include "kudu/gutil/map-util.h" #include "kudu/gutil/port.h" IWYU would have edited 1 files on your behalf. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304 PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished(); How do we protect against the race when right after this 'txn_coordinator_task_counter_' becomes non-zero again and tablet_->Shutdown() is called when TxnStatusManager tries to access the tablet? Should we just nullify txn_coordinator_ in WaitUntilTxnCoordinatorTasksFinished() or set another variable that controls the return status of ShouldRunTxnCoordinatorStalenessTask() along with 'state_'? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471 PS9, Line 471: if (state_ == STOPPING || state_ == STOPPED) { : return false; : } : return true; This looks a bit strange: it returns 'true' even if 'state_' is RUNNING? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997 PS9, Line 997: state_ What about SHUTDOWN state? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009 PS9, Line 1009: txn_coordinator_task_counter_ nit: is it expected that this counter goes negative? If not, maybe add CHECK()/DCHECK() to enforce the invariant here? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101 PS9, Line 101: txn_coordinator nit: it would be great to add a comment about the expectations about the type of this parameter. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439 PS9, Line 1439: !r->CheckRunning().ok() Does it mean that the TxnStalenessTrackerTask() will cease to run at this tablet server once one of its txn status tablets is shutdown (e.g., moved by other server by the rebalancer tool)? I'm not sure that's the behavior we want. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457 PS9, Line 1457: continue Maybe, we should jump not just to the next element 'replicas' container, but onto next iteration of the outer 'while (true)' loop? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 16 Jan 2021 06:10:15 +0000 Gerrit-HasComments: Yes
