Hannah Nguyen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 13:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc
File src/kudu/master/auto_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@64
PS13, Line 64:   void AssignLocations(int num_locations, int num_tservers) {
> nit: could you add a comment explaining that this doesn't assign locations
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@68
PS13, Line 68:     TSDescriptorVector descs;
             :     cluster_->mini_master(master_idx)->master()->ts_manager()->
             :       GetAllDescriptors(&descs);
> Probably want to put this into the ASSERT_EVENTUALLY too? Right now it's ju
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@76
PS13, Line 76:     // Assign num_locations unique locations to the first 
num_locations tservers.
             :     for (int i = 0; i < num_locations; ++i) {
             :       descs[i]->AssignLocationForTesting(Substitute("L$0", i));
             :     }
> What if num_locations turns to be greater than num_tservers?  Most likely t
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@93
PS13, Line 93: moves_scheduled_this_round_for_test(leader_idx), 0
> nit: in ASSERT_EQ(), the expected value comes first -- that way it's easier
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@106
PS13, Line 106: number_of_loop_iterations_for_test
> nit: use PascalCase for these, since they're not simple member accessors.
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@111
PS13, Line 111:     return -1;
> nit: maybe instead of doing this, if we know we'll always have a cluster, d
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@150
PS13, Line 150: <kN
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@151
PS13, Line 151: i==lead
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@153
PS13, Line 153: NE
> Could be even more explicit and say ASSERT_LT, right? Especially if -1 is a
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@189
PS13, Line 189: number_of_loop_iterations_for_test
> Maybe do a sanity check before shutting down the leader checking that this
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@196
PS13, Line 196: skewed
> nit: no longer balanced
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@208
PS13, Line 208: SleepFor(MonoDelta::FromSeconds(1));
> Here and below: is there a more reliable way of knowing the auto-rebalancer
Created function CheckAutoRebalancerStarted() with the above functionality, 
thank you!


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@258
PS13, Line 258: locations.size(), kNumTservers
> nit here and elsewhere: the expected value comes first in {ASSERT,EXPECT}_E
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@260
PS13, Line 260:   TestWorkload workload(cluster_.get());
              :   workload.set_num_tablets(kNumTablets);
              :   workload.set_num_replicas(1);
              :   workload.Setup();
> This pattern repeats in a few places.  Maybe, separate this into a method/f
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@268
PS13, Line 268: If placement policy can never be satisfied, the cluster is 
balanced
> This statement is too bold. :)
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@300
PS13, Line 300: } // namespace master
> After looking into the implementation of AutoRebalancerTask::CheckReplicaMo
I added a DCHECK() in RunLoop() of auto_rebalancer.cc to verify that the 
auto-rebalancer won't schedule a new batch of replica moves until the previous 
batch's moves have all completed.

Will add a test with the above scenario to verify that the number of tablet 
copying sessions per tablet server doesn't exceed the specified 
FLAGS_auto_rebalancing_max_moves_per_server value, thank you!


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@220
PS12, Line 220:         moves_scheduled_this_round_for_test_ = 0;
> Does this mean that replica_moves might not be empty? And if it's not empty
Clarified this in comments, but replica_moves should be empty at the beginning 
of each iteration of the loop.
Added a DCHECK to verify this, and moved the assignment of 
moves_scheduled_this_round_for_test_ to AFTER replica_moves are scheduled.


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@236
PS12, Line 236: and BuildTa
> nit: mind annotating this with a comment? E.g.
Done


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@283
PS12, Line 283:     }
              :
              :   } // end while
              : }
> nit: Could use WARN_NOT_OK
Done


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@448
PS12, Line 448:  up th
> nit: mind naming this 'replace' or something, and the below change 'add_non
Done


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@471
PS12, Line 471:       vector<Sockaddr> resolved;
> Doesn't this mean that it'll be promoted to a voter immediately? Even witho
Per rationale in comments here:

https://github.com/apache/kudu/blob/e72208436a625391739217394c67d783e992367a/src/kudu/tools/tool_replica_util.cc#L575-L596


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@680
PS12, Line 680:
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@696
PS12, Line 696:   }
              :
              :   return Status::OK();
> Doesn't this mean that we're always erasing the move from replica_moves?
:( I accidentally removed the "else" that was supposed to precede this 
statement.


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@732
PS12, Line 732: req.add_tablet_ids(tablet_uuid);
              :   RETURN_NOT_OK(proxy->GetConsensusState(req, &resp, &rpc));
              :   if (resp.has_error()) {
              :     return StatusFromPB(resp.error().status());
              :   }
              :   if (resp.tablets_size() == 0) {
              :     return Status::NotFound("tablet not found:", tablet_uuid);
              :   }
              :   DCHECK_EQ(1, resp.tablets_size());
              :   cstate = resp.tablets(0).cstate();
              :
              :   bool to_ts_uuid_in_config = false;
              :   bool to_ts_uuid_is_a_voter = false;
              :   for (const auto& peer : cstate.committed_config().peers()) {
              :     if (peer.permanent_uuid() == to_ts_uuid) {
              :       to_ts_uuid_in_config = true;
              :       if (peer.member_type() == RaftPeerPB::VOTER) {
              :         to_ts_uuid_is_a_voter = true
> Did you consider getting this information from the CatalogManager instead?
Would it be okay to keep as-is and as close to CheckCompleteMove() as possible, 
for simplicity? And to be modified in a later iteration of the auto-rebalancer, 
since it'll involve grabbing locks in both CheckMoveCompleted() here and maybe 
in ExecuteMoves(), if moving the LeaderStepDown request there.
Unless this way is concerningly horrible for performance?


http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@782
PS12, Line 782:       //   to voter.
              :       if (orig_leader_uuid == from_ts_uuid && orig_leader_uuid 
== cstate.leader_uuid()) {
              :         LeaderStepDownRequestPB req;
              :         LeaderStepDownResponsePB resp;
              :         RpcController rpc;
              :         req.set_dest_uuid(orig_leader_uuid);
              :         req.set_tablet_id(tablet_uuid);
              :         req.set_mode(LeaderStepDownMode::GRACEFUL);
              :         
rpc.set_timeout(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_timeout_seconds));
              :         RETURN_NOT_OK(proxy->LeaderStepDown(req, &resp, &rpc));
              :         if (resp.has_error()) {
              :           return StatusFromPB(resp.error().status());
              :         }
              :       }
              :
              :       from_ts_uuid_in_config = true;
              :       break;
              :     }
              :   }
              :
              :   if (!from_ts_uuid_in_config &&
              :       (
> Did you consider doing this in ExecuteMoves()? I found it a bit surprising
Responded above


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@131
PS13, Line 131: moves
> movement?
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@194
PS13, Line 194: will contain at most one move
> Maybe, add a DCHECK() for this pre-condition inside the loop below to spot
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@276
PS13, Line 276:     s = CheckReplicaMovesCompleted(&replica_moves, 
&all_moves_complete);
              :     while (!all_moves_complete) {
              :       if (!s.ok()) {
              :         LOG(WARNING) << Substitute("could not perform replica 
move: $0", s.ToString());
              :       }
              :       
SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_wait_for_replica_moves_seconds));
              :       s = CheckReplicaMovesCompleted(&replica_moves, 
&all_moves_complete);
              :     }
> nit: there is 'do {} while ();' construct which might be a better fit here
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@688
PS13, Line 688: Otherwise
> This comment doesn't reflect what's going on: as Andrew pointed out, the co
Fixed code to match the intended description in the comment!


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.h@777
PS13, Line 777:   master::TSManager* ts_manager() const;
> nit: not used?
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc@794
PS13, Line 794:     auto_rebalancer_.reset(new AutoRebalancerTask(this, 
master_->ts_manager()));
              :     RETURN_NOT_OK_PREPEND(auto_rebalancer_->Init(),
              :                           "failed to initialize 
auto-rebalancing task");
> nit: it's safer to reset the auto_rebalancer_ member with properly initiali
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc@1779
PS13, Line 1779: stalte
> nit: revert this?
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc@5271
PS13, Line 5271: int CatalogManager::auto_rebalancer_iterations() const {
               :     return 
auto_rebalancer_->number_of_loop_iterations_for_test_;
               : }
               :
               : int CatalogManager::auto_rebalancer_moves() const {
               :     return 
auto_rebalancer_->moves_scheduled_this_round_for_test_;
               : }
> nit: Can't we get these via auto_rebalancer()->number_of_loop_iterations_fo
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/ts_descriptor.h
File src/kudu/master/ts_descriptor.h:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/ts_descriptor.h@188
PS13, Line 188: location_ = loc
> Since the parameter is passed by value, it seems this should have been
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.h@248
PS13, Line 248: // A struct that
> nit: can probably omit these prefixes.
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.h@258
PS13, Line 258:
              : // A function that finds destination tservers for tablets that 
are
              : // identified by the rebalancing greedy algorithm as tablets 
that
              : // should be moved.
> nit: could you update this to be a bit more descriptive of what it does, ra
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.cc@499
PS13, Line 499:       if (tablets_in_move->find(tablet_id) == 
tablets_in_move->end()) {
> nit: use ContainsKey() for better readability?
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.cc@494
PS13, Line 494: // Shuffle the set of the tablet identifiers: that's to achieve 
even spread
              :     // of moves across tables with the same skew.
              :     std::shuffle(tablet_ids->begin(), tablet_ids->end(), 
random_generator);
              :     string move_tablet_id;
              :     for (const auto& tablet_id : *tablet_ids) {
              :       if (tablets_in_move->find(tablet_id) == 
tablets_in_move->end()) {
              :         // For now, choose the very first tablet that does not 
have replicas
              :         // in move. Later on, additional logic might be added 
to find
              :         // the best candidate.
              :         move_tablet_id = tablet_id;
              :         break;
              :       }
              :     }
              :     if (move_tablet_id.empty()) {
              :       return Status::NotFound(Substitute(
              :           "table $0: could not find any suitable replica to 
move "
              :           "from server $1 to server $2", move.table_id, 
move.from, move.to));
              :     }
              :     Rebalancer::ReplicaMove move_info;
              :     move_info.tablet_uuid = move_tablet_id;
              :     move_info.ts_uuid_from = move.from;
              :     const auto& extra_info = FindOrDie(extra_info_by_tablet_id, 
move_tablet_id);
              :     if (extra_info.replication_factor < extra_info.num_voters) {
              :       // The number of voter replicas is greater than the 
target replication
              :       // factor. It might happen the replica distribution would 
be better
              :       // if just removing the source replica. Anyway, once a 
replica is removed,
              :       // the system will automatically add a new one, if 
needed, where the new
              :       // replica will be placed to have balanced replica 
distribution.
              :       move_info.ts_uuid_to = "";
              :     } else {
              :       move_info.ts_uuid_to = move.to;
              :     }
              :     replica_moves->emplace_back(std::move(move_info));
              :     // Mark the tablet as 'has a replica in move'.
              :     tablets_in_move->emplace(move_tablet_id);
              :
              :     return Status::OK();
> nit: shift to the left a couple spaces
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/tools/rebalancer_tool.cc@1021
PS13, Line 1021: &tablet_ids
> Since these are repopulated every loop iteration, how about passing them by
Done


http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/tools/rebalancer_tool.cc@1020
PS13, Line 1020:     SelectReplicasToMove(move, extra_info_by_tablet_id, 
random_generator_,
               :                          &tablet_ids, &tablets_in_move, 
replica_moves);
> WARN_NOT_OK to match the current behavior?
Done



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 13
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 07 Feb 2020 20:44:36 +0000
Gerrit-HasComments: Yes

Reply via email to