Hannah Nguyen has posted comments on this change. ( http://gerrit.cloudera.org:8080/14177 )
Change subject: KUDU-2780: create thread for auto-rebalancing ...................................................................... Patch Set 13: (40 comments) http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@64 PS13, Line 64: void AssignLocations(int num_locations, int num_tservers) { > nit: could you add a comment explaining that this doesn't assign locations Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@68 PS13, Line 68: TSDescriptorVector descs; : cluster_->mini_master(master_idx)->master()->ts_manager()-> : GetAllDescriptors(&descs); > Probably want to put this into the ASSERT_EVENTUALLY too? Right now it's ju Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@76 PS13, Line 76: // Assign num_locations unique locations to the first num_locations tservers. : for (int i = 0; i < num_locations; ++i) { : descs[i]->AssignLocationForTesting(Substitute("L$0", i)); : } > What if num_locations turns to be greater than num_tservers? Most likely t Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@93 PS13, Line 93: moves_scheduled_this_round_for_test(leader_idx), 0 > nit: in ASSERT_EQ(), the expected value comes first -- that way it's easier Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@106 PS13, Line 106: number_of_loop_iterations_for_test > nit: use PascalCase for these, since they're not simple member accessors. Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@111 PS13, Line 111: return -1; > nit: maybe instead of doing this, if we know we'll always have a cluster, d Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@150 PS13, Line 150: <kN > nit: spacing Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@151 PS13, Line 151: i==lead > nit: spacing Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@153 PS13, Line 153: NE > Could be even more explicit and say ASSERT_LT, right? Especially if -1 is a Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@189 PS13, Line 189: number_of_loop_iterations_for_test > Maybe do a sanity check before shutting down the leader checking that this Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@196 PS13, Line 196: skewed > nit: no longer balanced Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@208 PS13, Line 208: SleepFor(MonoDelta::FromSeconds(1)); > Here and below: is there a more reliable way of knowing the auto-rebalancer Created function CheckAutoRebalancerStarted() with the above functionality, thank you! http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@258 PS13, Line 258: locations.size(), kNumTservers > nit here and elsewhere: the expected value comes first in {ASSERT,EXPECT}_E Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@260 PS13, Line 260: TestWorkload workload(cluster_.get()); : workload.set_num_tablets(kNumTablets); : workload.set_num_replicas(1); : workload.Setup(); > This pattern repeats in a few places. Maybe, separate this into a method/f Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@268 PS13, Line 268: If placement policy can never be satisfied, the cluster is balanced > This statement is too bold. :) Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer-test.cc@300 PS13, Line 300: } // namespace master > After looking into the implementation of AutoRebalancerTask::CheckReplicaMo I added a DCHECK() in RunLoop() of auto_rebalancer.cc to verify that the auto-rebalancer won't schedule a new batch of replica moves until the previous batch's moves have all completed. Will add a test with the above scenario to verify that the number of tablet copying sessions per tablet server doesn't exceed the specified FLAGS_auto_rebalancing_max_moves_per_server value, thank you! http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@220 PS12, Line 220: moves_scheduled_this_round_for_test_ = 0; > Does this mean that replica_moves might not be empty? And if it's not empty Clarified this in comments, but replica_moves should be empty at the beginning of each iteration of the loop. Added a DCHECK to verify this, and moved the assignment of moves_scheduled_this_round_for_test_ to AFTER replica_moves are scheduled. http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@236 PS12, Line 236: and BuildTa > nit: mind annotating this with a comment? E.g. Done http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@283 PS12, Line 283: } : : } // end while : } > nit: Could use WARN_NOT_OK Done http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@448 PS12, Line 448: up th > nit: mind naming this 'replace' or something, and the below change 'add_non Done http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@471 PS12, Line 471: vector<Sockaddr> resolved; > Doesn't this mean that it'll be promoted to a voter immediately? Even witho Per rationale in comments here: https://github.com/apache/kudu/blob/e72208436a625391739217394c67d783e992367a/src/kudu/tools/tool_replica_util.cc#L575-L596 http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@680 PS12, Line 680: > nit: spacing Done http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@696 PS12, Line 696: } : : return Status::OK(); > Doesn't this mean that we're always erasing the move from replica_moves? :( I accidentally removed the "else" that was supposed to precede this statement. http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@732 PS12, Line 732: req.add_tablet_ids(tablet_uuid); : RETURN_NOT_OK(proxy->GetConsensusState(req, &resp, &rpc)); : if (resp.has_error()) { : return StatusFromPB(resp.error().status()); : } : if (resp.tablets_size() == 0) { : return Status::NotFound("tablet not found:", tablet_uuid); : } : DCHECK_EQ(1, resp.tablets_size()); : cstate = resp.tablets(0).cstate(); : : bool to_ts_uuid_in_config = false; : bool to_ts_uuid_is_a_voter = false; : for (const auto& peer : cstate.committed_config().peers()) { : if (peer.permanent_uuid() == to_ts_uuid) { : to_ts_uuid_in_config = true; : if (peer.member_type() == RaftPeerPB::VOTER) { : to_ts_uuid_is_a_voter = true > Did you consider getting this information from the CatalogManager instead? Would it be okay to keep as-is and as close to CheckCompleteMove() as possible, for simplicity? And to be modified in a later iteration of the auto-rebalancer, since it'll involve grabbing locks in both CheckMoveCompleted() here and maybe in ExecuteMoves(), if moving the LeaderStepDown request there. Unless this way is concerningly horrible for performance? http://gerrit.cloudera.org:8080/#/c/14177/12/src/kudu/master/auto_rebalancer.cc@782 PS12, Line 782: // to voter. : if (orig_leader_uuid == from_ts_uuid && orig_leader_uuid == cstate.leader_uuid()) { : LeaderStepDownRequestPB req; : LeaderStepDownResponsePB resp; : RpcController rpc; : req.set_dest_uuid(orig_leader_uuid); : req.set_tablet_id(tablet_uuid); : req.set_mode(LeaderStepDownMode::GRACEFUL); : rpc.set_timeout(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_timeout_seconds)); : RETURN_NOT_OK(proxy->LeaderStepDown(req, &resp, &rpc)); : if (resp.has_error()) { : return StatusFromPB(resp.error().status()); : } : } : : from_ts_uuid_in_config = true; : break; : } : } : : if (!from_ts_uuid_in_config && : ( > Did you consider doing this in ExecuteMoves()? I found it a bit surprising Responded above http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@131 PS13, Line 131: moves > movement? Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@194 PS13, Line 194: will contain at most one move > Maybe, add a DCHECK() for this pre-condition inside the loop below to spot Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@276 PS13, Line 276: s = CheckReplicaMovesCompleted(&replica_moves, &all_moves_complete); : while (!all_moves_complete) { : if (!s.ok()) { : LOG(WARNING) << Substitute("could not perform replica move: $0", s.ToString()); : } : SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_wait_for_replica_moves_seconds)); : s = CheckReplicaMovesCompleted(&replica_moves, &all_moves_complete); : } > nit: there is 'do {} while ();' construct which might be a better fit here Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/auto_rebalancer.cc@688 PS13, Line 688: Otherwise > This comment doesn't reflect what's going on: as Andrew pointed out, the co Fixed code to match the intended description in the comment! http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.h@777 PS13, Line 777: master::TSManager* ts_manager() const; > nit: not used? Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc@794 PS13, Line 794: auto_rebalancer_.reset(new AutoRebalancerTask(this, master_->ts_manager())); : RETURN_NOT_OK_PREPEND(auto_rebalancer_->Init(), : "failed to initialize auto-rebalancing task"); > nit: it's safer to reset the auto_rebalancer_ member with properly initiali Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc@1779 PS13, Line 1779: stalte > nit: revert this? Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/catalog_manager.cc@5271 PS13, Line 5271: int CatalogManager::auto_rebalancer_iterations() const { : return auto_rebalancer_->number_of_loop_iterations_for_test_; : } : : int CatalogManager::auto_rebalancer_moves() const { : return auto_rebalancer_->moves_scheduled_this_round_for_test_; : } > nit: Can't we get these via auto_rebalancer()->number_of_loop_iterations_fo Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/ts_descriptor.h File src/kudu/master/ts_descriptor.h: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/master/ts_descriptor.h@188 PS13, Line 188: location_ = loc > Since the parameter is passed by value, it seems this should have been Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.h File src/kudu/rebalance/rebalancer.h: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.h@248 PS13, Line 248: // A struct that > nit: can probably omit these prefixes. Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.h@258 PS13, Line 258: : // A function that finds destination tservers for tablets that are : // identified by the rebalancing greedy algorithm as tablets that : // should be moved. > nit: could you update this to be a bit more descriptive of what it does, ra Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.cc File src/kudu/rebalance/rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.cc@499 PS13, Line 499: if (tablets_in_move->find(tablet_id) == tablets_in_move->end()) { > nit: use ContainsKey() for better readability? Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/rebalance/rebalancer.cc@494 PS13, Line 494: // Shuffle the set of the tablet identifiers: that's to achieve even spread : // of moves across tables with the same skew. : std::shuffle(tablet_ids->begin(), tablet_ids->end(), random_generator); : string move_tablet_id; : for (const auto& tablet_id : *tablet_ids) { : if (tablets_in_move->find(tablet_id) == tablets_in_move->end()) { : // For now, choose the very first tablet that does not have replicas : // in move. Later on, additional logic might be added to find : // the best candidate. : move_tablet_id = tablet_id; : break; : } : } : if (move_tablet_id.empty()) { : return Status::NotFound(Substitute( : "table $0: could not find any suitable replica to move " : "from server $1 to server $2", move.table_id, move.from, move.to)); : } : Rebalancer::ReplicaMove move_info; : move_info.tablet_uuid = move_tablet_id; : move_info.ts_uuid_from = move.from; : const auto& extra_info = FindOrDie(extra_info_by_tablet_id, move_tablet_id); : if (extra_info.replication_factor < extra_info.num_voters) { : // The number of voter replicas is greater than the target replication : // factor. It might happen the replica distribution would be better : // if just removing the source replica. Anyway, once a replica is removed, : // the system will automatically add a new one, if needed, where the new : // replica will be placed to have balanced replica distribution. : move_info.ts_uuid_to = ""; : } else { : move_info.ts_uuid_to = move.to; : } : replica_moves->emplace_back(std::move(move_info)); : // Mark the tablet as 'has a replica in move'. : tablets_in_move->emplace(move_tablet_id); : : return Status::OK(); > nit: shift to the left a couple spaces Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/tools/rebalancer_tool.cc File src/kudu/tools/rebalancer_tool.cc: http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/tools/rebalancer_tool.cc@1021 PS13, Line 1021: &tablet_ids > Since these are repopulated every loop iteration, how about passing them by Done http://gerrit.cloudera.org:8080/#/c/14177/13/src/kudu/tools/rebalancer_tool.cc@1020 PS13, Line 1020: SelectReplicasToMove(move, extra_info_by_tablet_id, random_generator_, : &tablet_ids, &tablets_in_move, replica_moves); > WARN_NOT_OK to match the current behavior? Done -- To view, visit http://gerrit.cloudera.org:8080/14177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4 Gerrit-Change-Number: 14177 Gerrit-PatchSet: 13 Gerrit-Owner: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 07 Feb 2020 20:44:36 +0000 Gerrit-HasComments: Yes
