Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )
Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only ...................................................................... Patch Set 9: (7 comments) > Patch Set 9: > > (7 comments) > > IWYU isn't happy yet: > > >>> Fixing #includes in > >>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc' > @@ -32,6 +32,7 @@ > #include "kudu/common/wire_protocol.h" > #include "kudu/consensus/metadata.pb.h" > #include "kudu/consensus/raft_consensus.h" > +#include "kudu/gutil/casts.h" > #include "kudu/gutil/macros.h" > #include "kudu/gutil/map-util.h" > #include "kudu/gutil/port.h" > IWYU would have edited 1 files on your behalf. Thanks a lot for looking into the failure! http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304 PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished(); > How do we protect against the race when right after this 'txn_coordinator_t Are you asking after L304 which we waited 'txn_coordinator_task_counter_' to become zero. and then it again becomes non-zero and tablet is shutdown state. what happens if TxnStatusManager tries to access the tablet? In that case, the task will not be enabled/ran when the tablet is shutting down. So that ShouldRunTxnCoordinatorStalenessTask() will return false and no new tasks will run. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471 PS9, Line 471: if (state_ == STOPPING || state_ == STOPPED) { : return false; : } : return true; > This looks a bit strange: it returns 'true' even if 'state_' is RUNNING? Ah, good catch. Sorry it is the opposite meaning as what the method name implies here. Adjust the caller as well. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997 PS9, Line 997: state_ > What about SHUTDOWN state? SHUTDOWN is only set after the tablet is fully stopped. Which the raft consensus has been stopped and the call back should no longer be triggered. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009 PS9, Line 1009: txn_coordinator_task_counter_ > nit: is it expected that this counter goes negative? If not, maybe add CHE Done http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101 PS9, Line 101: txn_coordinator > nit: it would be great to add a comment about the expectations about the ty Done http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439 PS9, Line 1439: !r->CheckRunning().ok() > Does it mean that the TxnStalenessTrackerTask() will cease to run at this t Hmm, yeah, I guess we wouldn't want to stop the TxnStalenessTrackerTask() if some txn status tablets is shutdown. We may want to resume on checking the other tablets. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457 PS9, Line 1457: continue > Maybe, we should jump not just to the next element 'replicas' container, bu I am not sure why we want to do that? Why don't we check for the rest tablets? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 19 Jan 2021 01:15:04 +0000 Gerrit-HasComments: Yes
