Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 17:

(22 comments)

Focused mostly on the non-test code.

http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.h
File src/kudu/master/auto_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.h@117
PS17, Line 117:   Status GetMoves(
nit: can you also note how this can fail?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.h@132
PS17, Line 132: Marks replicas that
nit: marks them with what?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.h@137
PS17, Line 137:   Status ExecuteMoves(
nit: can you also note how this can fail?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.h@139
PS17, Line 139: const bool& has_location_violations
nit: booleans are trivially copyable so we don't need to pass them by const ref.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.h@188
PS17, Line 188: wake_up_cv_
It wasn't clear to me that this is only used to signal that the rebalancer 
thread is shutting down. Maybe rename it 'shutdown_cv_'? And update the comment 
to reflect that?


http://gerrit.cloudera.org:8080/#/c/14177/14/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/14/src/kudu/master/auto_rebalancer.cc@660
PS14, Line 660:         replicas_at_location.
> Actually just removed this param; can just check size of replica_moves per
This principal holds true everywhere. Partial or incomplete mutation makes 
things difficult to reason about.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@199
PS17, Line 199:     // Check if shutdown was signaled.
              :     {
              :       lock_guard<Mutex> l(lock_);
              :       if (closing_) return;
              :     }
              : 
nit: maybe this should be

if (wake_up_cv_.WaitFor(MonoDelta::FromSeconds(FLAGS_auto_rebal..))) {
  return;
}

then we'd be able to get rid of that sleep at the bottom.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@207
PS17, Line 207:       CatalogManager::ScopedLeaderSharedLock 
l(catalog_manager_);
              :       if (!l.first_failed_status().ok()) {
              :         moves_scheduled_this_round_for_test_ = 0;
              :         continue;
              :       }
Should we at least sleep here? Otherwise this is a tight loop for followers.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@261
PS17, Line 261:       continue;
If we're moving on without successfully executing and calling 
CheckReplicaMovesCompleted(), doesn't that mean 'replica_moves' might not be 
empty upon each iteration? Should we be initializing a new 'replica_moves' on 
each iteration?

To make matters worse, it seems like both 'replica_moves' and 
'has_location_violations' is updated even if this fails, meaning we need to 
make sure we're handling the failure scenario with care (or just don't update 
them if this fails, as I note in-line with the test methods).

It's probably worth adding a test that rebalances a cluster with unstable Raft 
configurations (e.g. using latency injection flags and low Raft timeouts) to 
test the behavior when this fails. As it is, I'm pretty sure we'd hit a DCHECK 
at the top of the loop.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@267
PS17, Line 267: wake_up_cv_.WaitFor(
              :           
MonoDelta::FromSeconds(FLAGS_auto_rebalancing_wait_for_replica_moves_seconds));
              :         if (closing_) {
              :           return;
              :         }
This can be:

if (wake_up_cv_.WaitFor(MonoDelta::FromSeconds(FLAGS_auto_reba..))) {
  return;
}


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@320
PS17, Line 320:   // If no placement policy violations are found, do load 
rebalancing.
              :   if (!replica_moves->empty()) {
              :     return Status::OK();
              :   }
nit: move this into the policy fixer block?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@326
PS17, Line 326:   *has_location_violations = false;
nit: update a local variable and swap at the end?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@332
PS17, Line 332: /*cross_location=*/
nit: don't need this now that the enum is self-documenting


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@343
PS17, Line 343: replica_moves
nit: update a local variable and swap at the end?

Also add a couple spaces


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@388
PS17, Line 388: replica_moves
nit: update a local variable and swap at the end?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@450
PS17, Line 450:  Mark the specified destination tserver,
nit: what mark are we making?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@501
PS17, Line 501:   TSDescriptorVector descriptors;
              :   
ts_manager_->GetDescriptorsAvailableForPlacement(&descriptors);
              :   if (descriptors.size() != ts_manager_->GetLiveCount()) {
              :     return Status::IllegalState(Substitute("not all tservers 
available for tablet placement"));
              :   }
The TSManager only knows about tablet servers that have heartbeated recently, 
whereas the catalog manager knows about all tablet servers that contain tablets.

I think we want to wait until we know that all of the tablet servers that 
contain tablets are online. Otherwise, I'm pretty sure this condition is almost 
always true (unless a tablet server has stopped heartbeating for some reason). 
If we're supposed to have 10 tablet servers, we restart the master, and 3 
tablet servers heartbeat to the master, this condition will be true.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@526
PS17, Line 526:     RETURN_NOT_OK(catalog_manager_->GetAllTables(&table_infos));
Do we need to RETURN_NOT_OK(leaderlock.first_failed_status())?

Do we have any tests that exercise rebalancing when there are leadership 
changes on the master?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@545
PS17, Line 545:       TabletMetadataLock l(tablet.get(), LockMode::READ);
It'd be much less error-prone and much easier to read if this were called 
`tablet_l` and the lock at L532 were `table_l` or something to highlight that 
they're different locks.


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@570
PS17, Line 570:       // Consensus state information is the same for all 
replicas of this tablet.
Is this true? Or is it a benign approximation? My guess is that it's an 
approximation because it's easy to get what the master believes to be the 
consensus state of the tablet, and that it's benign because rebalancing doesn't 
actually consider the consensus state when deciding whether to make moves. Are 
either of these true?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@700
PS17, Line 700: hange 'moves_complete' to false
No longer applies?


http://gerrit.cloudera.org:8080/#/c/14177/17/src/kudu/master/auto_rebalancer.cc@775
PS17, Line 775: *completion_status = Status::Incomplete(Substitute(
              :         "tablet $0, TS $1 -> TS $2 move failed, target replica 
disappeared",
              :         tablet_uuid, from_ts_uuid, to_ts_uuid));
This isn't used if we return Status::OK(). Why not just return the error?



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 17
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 21 Feb 2020 20:40:31 +0000
Gerrit-HasComments: Yes

Reply via email to