Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 11:

(18 comments)

Haven't made it all the way through; mostly nits, but there are a couple real 
questions worth considering (namely, what happens if tservers are down?)

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@146
PS11, Line 146: 300
Is this used by the rebalancer thread?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@249
PS11, Line 249:         lock_guard<simple_spinlock> l(lock_for_test_);
              :         moves_scheduled_this_round_for_test_ = 0;
nit: might be worth just using an atomic<int> or something


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@252
PS11, Line 252:       
SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_interval_seconds));
Don't need this anymore, right? We'll sleep up top? Same below.


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@298
PS11, Line 298:   // One location: use greedy rebalancing algorithm to find 
moves.
              :   if (ts_id_by_location.size() == 1) {
              :     rebalance::TwoDimensionalGreedyAlgo algo;
              :     RETURN_NOT_OK(GetMovesUsingRebalancingAlgo(raw_info, &algo, 
CrossLocations::NO, replica_moves));
              :     *has_location_violations = false;
              :   } else {
              :
nit: could you write this avoiding the extra scope layering? It's easier on the 
eyes that way, e.g

if (ts_id_by_location.size() == 1) {
  // do stuff
  return Status::OK();
}
// Logic to consider location placement policy...


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@317
PS11, Line 317:     // If no placement policy violations are found, do load 
rebalancing.
              :     if (replica_moves->empty()) {
nit: could you write this avoiding the extra scope layering? It's easier on the 
eyes that way, e.g

if (!replica_moves->empty()) {
  return Status::OK();
}
// No placement policy violations found, do other stuff...


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@358
PS11, Line 358: ion==Cro
nit: spacing


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@375
PS11, Line 375: on==CrossLoca
nit: spacing


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@384
PS11, Line 384:     std::random_device rd;
              :     std::mt19937 random_generator(rd());
How about we have a single random_generator associated with the task and reuse 
it?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@405
PS11, Line 405: ANY_REPLICA
Why is this distinction over VOTER_REPLICA important?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@498
PS11, Line 498:   
ts_manager_->GetDescriptorsAvailableForPlacement(&descriptors);
What happens when not all of the tablet servers have registered with the master 
yet? Or more generally, when there exist tablets with replicas on tablet 
servers that haven't registered yet? Will this eventually return an error? Or 
will it try to move things that it maybe shouldn't?

Could probably use a test.


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@517
PS11, Line 517:     CatalogManager::ScopedLeaderSharedLock l(catalog_manager_);
Check the status here.


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@550
PS11, Line 550: l
nit: could you use a different variable name so this doesn't get conflated with 
the TabletMetadataLock?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@572
PS11, Line 572:             ConsensusConfigType::COMMITTED,
              :             cstatepb.current_term(),
              :             cstatepb.committed_config().opid_index(),
              :             cstatepb.leader_uuid(),
              :             voters,
              :             non_voters);
nit: too many spaces


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@739
PS11, Line 739:   sort(locations.begin(), locations.end());
Why is this sorting important?


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@748
PS11, Line 748:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/rebalance/rebalancer.h@160
PS11, Line 160:   // Filter move operations in 'replica_moves': remove all 
operations that would
              :   // involve moving replicas of tablets which are in 
'scheduled_moves'. The
              :   // 'replica_moves' cannot be null.
              :   static void FilterMoves(const MovesInProgress& 
scheduled_moves,
              :                           std::vector<ReplicaMove>* 
replica_moves);
Did this need to be made public? It isn't used by auto_rebalancer or 
rebalancer_tool AFAICT.


http://gerrit.cloudera.org:8080/#/c/14177/9/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14177/9/src/kudu/tools/rebalancer_tool.cc@894
PS9, Line 894:  * 5
Not your fault, but any idea what this is?


http://gerrit.cloudera.org:8080/#/c/14177/9/src/kudu/tools/rebalancer_tool.cc@924
PS9, Line 924: // This will return Status::NotFound if no replica can be moved.
             :     // In that case, we just continue through the loop.
nit: maybe wrap with ignore_result() then



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 11
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 04 Oct 2019 19:36:36 +0000
Gerrit-HasComments: Yes

Reply via email to