Hannah Nguyen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: WIP KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 3:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@238
PS3, Line 238:       
SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_interval_seconds));
> I agree sleeping makes sense, otherwise (in case GetMoves() or ExecuteMoves
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@271
PS3, Line 271: Status AutoRebalancerTask::GetMoves(const ClusterRawInfo& 
raw_info, const ClusterInfo& ci, vector<Rebalancer::ReplicaMove>* 
replica_moves, const TabletsPlacementInfo& placement_info, bool* policy_fixing) 
const {
> nit: wrap this. Same elsewhere.
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@279
PS3, Line 279:
             :   if (ts_id_by_location.size() == 1) {
             :     rebalance::TwoDimensionalGreedyAlgo algo;
             :     RETURN_NOT_OK(GetMovesUsingRebalancingAlgo(raw_info, 
replica_moves, &algo));
             :     *policy_fixing = false;
             :   } else {
             : 
             :     if (rebalancer_->ShouldRunPolicyFixer()) {
             :       vector<PlacementPolicyViolationInfo> ppvi;
             :       
RETURN_NOT_OK(DetectPlacementPolicyViolations(placement_info, &ppvi));
             :       // Filter out all reported violations which are already 
taken care of.
             :       RETURN_NOT_OK(FindMovesToReimposePlacementPolicy(
             :           placement_info, ci.locality, ppvi, replica_moves));
             :       *policy_fixing = true;
             :     }
             :
             :     if (replica_moves->empty()) {
             :       if (rebalancer_->ShouldRunCrossLocationRebalancing()) {
             :         rebalance::LocationBalancingAlgo algo(1.0);
             :         RETURN_NOT_OK(GetMovesUsingRebalancingAlgo(
             :             raw_info, replica_moves, &algo, 
/*cross_location=*/true));
             :       }
             :
             :       if (rebalancer_->ShouldRunIntraLocationRebalancing() && 
replica_moves->size() < max_moves) {
             :         rebalance::TwoDimensionalGreedyAlgo algo;
             :         for (const auto& elem : ts_id_by_location) {
             :           const auto& location = elem.first;
             :           ClusterRawInfo location_raw_info;
             :           BuildClusterRawInfo(location, &location_raw_info);
             :           RETURN_NOT_OK(GetMovesUsingRebalancingAlgo(
             :             location_raw_info, replica_moves, &algo));
             :         }
             :       }
             :       *policy_fixing = false;
             :     }
             :   }
             :
             :   return Status::OK();
             : }
> nit: This could use some comments.
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@377
PS3, Line 377:     std::shuffle(tablet_ids.begin(), tablet_ids.end(), 
random_device());
             :     string move_tablet_id;
             :     for (const auto& tablet_id : tablet_ids) {
             :       if (tablets_in_move.find(tablet_id) == 
tablets_in_move.end()) {
             :         // For now, choose the very first tablet that does not 
have replicas
             :         // in move. Later on, additional logic might be added to 
find
             :         // the best candidate.
             :         move_tablet_id = tablet_id;
             :         break;
             :       }
             :     }
             :     if (move_tablet_id.empty()) {
             :       return Status::NotFound(Substitute(
             :           "table $0: could not find any suitable replica to move 
"
             :           "from server $1 to server $2", move.table_id, 
move.from, move.to));
             :     }
             :     Rebalancer::ReplicaMove move_info;
             :     move_info.tablet_uuid = move_tablet_id;
             :     move_info.ts_uuid_from = move.from;
             :     const auto& extra_info = FindOrDie(extra_info_by_tablet_id, 
move_tablet_id);
             :     if (extra_info.replication_factor < extra_info.num_voters) {
             :       // The number of voter replicas is greater than the target 
replication
             :       // factor. It might happen the replica distribution would 
be better
             :       // if just removing the source replica. Anyway, once a 
replica is removed,
             :       // the system will automatically add a new one, if needed, 
where t
> Seems like a good chunk of this is taken from rebalancer_tool.cc. Can we re
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@439
PS3, Line 439: Status AutoRebalancerTask::ExecuteMoves(const 
vector<Rebalancer::ReplicaMove>& replica_moves, const bool& policy_fixing) {
> Please add comments
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@479
PS3, Line 479:   } else { // ScheduleReplicaMove()
> How difficult would it be to refactor and share across implementations?
This code is based on some of that function (maybe less than half?) with some 
modifications.
I don't think it's worth refactoring, especially since it would have to be 
pulled out of the tools namespace.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@495
PS3, Line 495:       shared_ptr<Messenger> messenger;
> Could we build this once for the Task and reuse it?
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@514
PS3, Line 514:         TSManager* ts_manager = GetTSManager();
             :         if (!ts_manager->LookupTSByUUID(dst_ts_uuid, &desc)) {
             :           return Status::NotFound("Could not find destination 
tserver");
             :         }
> Could probably be done before updating anything.
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@538
PS3, Line 538:   const boost::optional<string>& location,
             :   ClusterRawInfo* raw_info) const {
> nit: indent a couple more here
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@615
PS3, Line 615:   if (!location) {
             :     // Information on the whole cluster.
             :     raw_info->tserver_summaries = std::move(tserver_summaries);
             :     raw_info->tablet_summaries = std::move(tablet_summaries);
             :     raw_info->table_summaries = std::move(table_summaries);
             :   }
             :
             :   else {
             :     // Information on the specified location only: filter out 
non-relevant info.
             :     const auto& location_str = *location;
             :
             :     unordered_set<string> ts_ids_at_location;
             :     for (const auto& summary : tserver_summaries) {
             :       if (summary.ts_location == location_str) {
             :         raw_info->tserver_summaries.push_back(summary);
             :         InsertOrDie(&ts_ids_at_location, summary.uuid);
             :       }
             :     }
             :
             :     unordered_set<string> table_ids_at_location;
             :     for (const auto& summary : tablet_summaries) {
             :       const auto& replicas = summary.replicas;
             :       decltype(summary.replicas) replicas_at_location;
             :       replicas_at_location.reserve(replicas.size());
             :       for (const auto& replica : replicas) {
             :         if (ContainsKey(ts_ids_at_location, replica.ts_uuid)) {
             :           replicas_at_location.push_back(replica);
             :         }
             :       }
             :       if (!replicas_at_location.empty()) {
             :         table_ids_at_location.insert(summary.table_id);
             :       }
             :       raw_info->tablet_summaries.push_back(summary);
             :       raw_info->tablet_summaries.back().replicas = 
std::move(replicas_at_location);
             :     }
             :
             :     for (const auto& summary : table_summaries) {
             :       if (ContainsKey(table_ids_at_location, summary.id)) {
             :         raw_info->table_summaries.push_back(summary);
             :       }
             :     }
             :   }
> Can this filtering be done in the above loop itself? Why or why not?
table id's can be inserted into table_ids_at_location in the above for loop, so 
the filtering can't be done until all the table_summaries have been iterated 
through at least once.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@665
PS3, Line 665: auto
> Could these be const refs too? Same below?
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@667
PS3, Line 667: 1
> Is it ok for this to be hardcoded? Should there be some threshold for it? A
Hm this has been the definition of a balanced cluster that the rebalancer tool 
has been using.
see rebalance::IsBalanced() in rebalance/rebalance_algo-test.cc and the 
comments above rebalance::TwoDimensionalGreedyAlgo in 
rebalance/rebalance_algo.h.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@673
PS3, Line 673:   if (max_table_skew-min_table_skew > 1) return true;
> warning: redundant boolean literal in conditional return statement [readabi
Keeping as is for readability


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@661
PS3, Line 661: bool IsClusterOrTablesSkewed(const ClusterInfo& ci) {
             :
             :   // Cluster skew?
             :   const auto& servers_load_info = 
ci.balance.servers_by_total_replica_count;
             :   const auto min_replica_count = 
servers_load_info.begin()->first;
             :   const auto max_replica_count = 
servers_load_info.rbegin()->first;
             :   if (max_replica_count-min_replica_count > 1) return true;
             :
             :   // Table skew?
             :   const auto& table_skew_info = ci.balance.table_info_by_skew;
             :   const auto min_table_skew = table_skew_info.begin()->first;
             :   const auto max_table_skew = table_skew_info.rbegin()->first;
             :   if (max_table_skew-min_table_skew > 1) return true;
             :
             :   return false;
             : }
> nit: Put this in an anonymous namespace.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 3
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 19 Sep 2019 18:50:43 +0000
Gerrit-HasComments: Yes

Reply via email to