Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20310 )

Change subject: KUDU-3497 optimize leader rebalancer algorithm
......................................................................


Patch Set 17:

(2 comments)

Thank you for the review.
Fixed the bug.
And could you please check if the logic is right.

http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc
File src/kudu/master/auto_leader_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@248
PS10, Line 248:     string leader_uuid = from_info.first;
              :     int32_t need_transfer_count = from_info.second;
              :     int32_t pick_count = 0;
              :     vector<string>& uuid_leaders = 
leader_tablet_ids_by_ts_uuid[leader_uuid];
              :     std::shuffle(uuid_leaders.begin(), uuid_leaders.end(), 
random_generator_);
              :     // This loop would generate 'uuid_leaders.size()' leader 
transferring tasks at most.
              :     // Every task would p
> Isn't the number of remaining tablets under-estimated for a big cluster (i.
Here we have calculated how many leader num X the tserver should have, and the 
code of this line is trying to make remaining tablets minus the exact X.
The possible value of X could be:
  1. remaining_tablets / remaining_tservers when remaining_tablets % 
remaining_tservers == 0, and remaining_tablets / remaining_tservers is actually 
target_leader_count.
  2. remaining_tablets % remaining_tservers != 0, and the leader num of the 
tserver now is greater than the double value remaining_tablets / 
remaining_tservers, it should be floor(remaining_tablets / remaining_tservers), 
which is actually target_leader_count - 1.
  3. Same with 2 but the leader num of the tserver now is lower than the double 
value remaining_tablets / remaining_tservers, it should be 
floor(remaining_tablets / remaining_tservers), which is actually 
target_leader_count.


http://gerrit.cloudera.org:8080/#/c/20310/16/src/kudu/master/auto_leader_rebalancer.cc
File src/kudu/master/auto_leader_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/20310/16/src/kudu/master/auto_leader_rebalancer.cc@241
PS16, Line 241:
> Could 'remaining_tservers' end up being zero (i.e.  remaining_tservers == 0
I have filtered the tserver with no replica out in the new patch, so it will 
not happen.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f1fe796fd98da2d8764da793b7e254319e6348a
Gerrit-Change-Number: 20310
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Tue, 14 Nov 2023 05:23:11 +0000
Gerrit-HasComments: Yes

Reply via email to