Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by 
the leader only
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@a482
PS2, Line 482:
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
             :
Can we at least check that we have acquired the leader lock?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@724
PS2, Line 724: auto* consensus = status_tablet_.tablet_replica_->consensus();
             :         DCHECK(consensus);
nit: can combine as:

 auto* consensus = DCHECK_NOTNULL(...consensus());


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc@1420
PS2, Line 1420: {
              :       shared_lock<RWMutex> l(lock_);
              :       for (const auto& elem : tablet_map_) {
              :         auto r = elem.second;
              :         // Find txn status tablet replicas.
              :         if (r->txn_coordinator()) {
              :           replicas.emplace_back(std::move(r));
              :         }
              :       }
              :     }
              :     for (auto& r : replicas) {
              :       if (shutdown_latch_.count() == 0) {
              :         return;
              :       }
              :       auto* coordinator = DCHECK_NOTNULL(r->txn_coordinator());
It may be worth considering updating this to look more like how we schedule 
maintenance ops. E.g. rather than iterating through each replica and trying to 
check for coordinator, instead when we create a replica that is a coordinator, 
"register" that coordinator with the TSTabletManager when it's fully started 
and ready.

Conceptually, the problems at hand seems similar, that we want to run tasks 
using a shared threadpool, and that the tasks are tied to the our replicas, and 
the replica shutting down shouldn't collide with the task running. Take a look 
at util/maintenance_manager.cc to see if a similar approach might make sense 
here, e.g. when shutting down a replica, if an abort task is in progress, we 
wait for it to complete, and then proceed with the shutdown. That might also 
help avoid races with this task and the initial call to TabletReplica::Start().



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 24 Dec 2020 00:08:26 +0000
Gerrit-HasComments: Yes

Reply via email to