Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )
Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only ...................................................................... Patch Set 6: (19 comments) http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744 PS6, Line 744: ; > nit: drop extra semicolon Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751 PS6, Line 751: threads.emplace_back([&] { > What does the multithreading add here? Seems like it complicates things sin The point is to ensure concurrent txn reads/writes have no failures with frequent leader elections. Add the comment to be clear. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758 PS6, Line 758: if (s.ok()) { > What failures are we expecting? Add a comment http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748 PS6, Line 748: std::atomic<int64_t> next_thread_id = 0; : vector<int> txn_id_index = { 1, 6, 11, 16, 21}; : for (int t = 0; t < 5; t++) { : threads.emplace_back([&] { : int thread_id = next_thread_id++; : CHECK_LT(thread_id, txn_id_index.size()); : int start_txn_id = txn_id_index[thread_id]; : for (int i = start_txn_id; i < start_txn_id + 5; i++) { : TxnStatusEntryPB txn_status; : Status s = txn_sys_client_->BeginTransaction(i, kUser); : if (s.ok()) { : // Make sure a re-election happens before the following read. : SleepFor(MonoDelta::FromMilliseconds(3 * FLAGS_raft_heartbeat_interval_ms)); : s = txn_sys_client_->GetTransactionStatus(i, kUser, &txn_status); : CHECK_OK(s); : CHECK(txn_status.has_user()); : CHECK_STREQ(kUser, txn_status.user().c_str()); : CHECK(txn_status.has_state()); : CHECK_EQ(TxnStatePB::OPEN, txn_status.state()); : } : } : }); : } > nit: would be easier to read as: Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403 PS6, Line 403: << > nit: maybe add a message here too indicating what's going on Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472 PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down or shutdown. State: $0", : TabletStatePB_Name(state_)); > nit: might be cleaner to let the callers of these otherwise simple function Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@979 PS6, Line 979: if (!txn_coordinator_) { : return false; : } > nit: maybe only call this if txn_coordinator_ is set and change this to a D I updated the name to reflect the functionality which is ShouldRunTxnCoordinatorStalenessTask(). So keep it as it is here. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@997 PS6, Line 997: state_ == STOPPING || state_ == STOPPED) { > nit: I'm curious, why not reuse the != RUNNING condition from above? A comm Added a comment. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@109 PS6, Line 109: but some operations : // may still be legal. > I might be missing something but is this fact used anywhere? What operation Sorry this is not used, removed it. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@106 PS6, Line 106: const Status& replica_status() const { return replica_status_; } : : // Leadership status of the transaction status manager. If not OK, the : // transaction status manager is not the leader, but some operations : // may still be legal. : const Status& leader_status() const { : return leader_status_; : } > nit: Are these ever used? Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@118 PS6, Line 118: if (!replica_status_.ok()) { : return replica_status_; : } : return leader_status_; > nit: a bit simpler as Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83 PS6, Line 83: If this time is exceeded, the " : "node crashes." > If the consequence is so severe, maybe we should set this to be something l After thinking more, I decide to remove the crash logic and just return, since node crash seems to be too severe for timeout on OpTracker::WaitForAllToFinish(). Now, if timeout, it will trigger the client to get a ServiceUnavailable error and they can retry until eventually succeed (or fail). Let me know if you think otherwise. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@217 PS6, Line 217: leader_ready_term != initial_term_ || : !leader_shared_lock_.owns_lock()) > nit: Curious, doesn't this mean that we've changed leaders? The error messa Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@235 PS6, Line 235: NOT_THE_LEADER > Should this only be set if leader_status_ is set? Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@320 PS6, Line 320: RETURN_NOT_OK > nit: just return this? Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@369 PS6, Line 369: } > nit: maybe log something about beginning to wait? Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@414 PS6, Line 414: failed > nit: "interrupted" would be less alarming, given this is somewhat normal Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@430 PS6, Line 430: Substitute("Tablet") > nit: remove Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc@1428 PS6, Line 1428: RunTx > nit: maybe name these ShouldRun... Done -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 14 Jan 2021 07:31:25 +0000 Gerrit-HasComments: Yes
