Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20310 )
Change subject: KUDU-3497 optimize leader rebalancer algorithm ...................................................................... Patch Set 17: (2 comments) Thank you for the review. Fixed the bug. And could you please check if the logic is right. http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc File src/kudu/master/auto_leader_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@248 PS10, Line 248: string leader_uuid = from_info.first; : int32_t need_transfer_count = from_info.second; : int32_t pick_count = 0; : vector<string>& uuid_leaders = leader_tablet_ids_by_ts_uuid[leader_uuid]; : std::shuffle(uuid_leaders.begin(), uuid_leaders.end(), random_generator_); : // This loop would generate 'uuid_leaders.size()' leader transferring tasks at most. : // Every task would p > Isn't the number of remaining tablets under-estimated for a big cluster (i. Here we have calculated how many leader num X the tserver should have, and the code of this line is trying to make remaining tablets minus the exact X. The possible value of X could be: 1. remaining_tablets / remaining_tservers when remaining_tablets % remaining_tservers == 0, and remaining_tablets / remaining_tservers is actually target_leader_count. 2. remaining_tablets % remaining_tservers != 0, and the leader num of the tserver now is greater than the double value remaining_tablets / remaining_tservers, it should be floor(remaining_tablets / remaining_tservers), which is actually target_leader_count - 1. 3. Same with 2 but the leader num of the tserver now is lower than the double value remaining_tablets / remaining_tservers, it should be floor(remaining_tablets / remaining_tservers), which is actually target_leader_count. http://gerrit.cloudera.org:8080/#/c/20310/16/src/kudu/master/auto_leader_rebalancer.cc File src/kudu/master/auto_leader_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/20310/16/src/kudu/master/auto_leader_rebalancer.cc@241 PS16, Line 241: > Could 'remaining_tservers' end up being zero (i.e. remaining_tservers == 0 I have filtered the tserver with no replica out in the new patch, so it will not happen. Done -- To view, visit http://gerrit.cloudera.org:8080/20310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f1fe796fd98da2d8764da793b7e254319e6348a Gerrit-Change-Number: 20310 Gerrit-PatchSet: 17 Gerrit-Owner: Song Jiacheng <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Tue, 14 Nov 2023 05:23:11 +0000 Gerrit-HasComments: Yes
