Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20310 )

Change subject: KUDU-3497 optimize leader rebalancer algorithm
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20310/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20310/10//COMMIT_MSG@10
PS10, Line 10: and
             : replication factor
You mean RF=1, i.e. non-replication tables here?  It would be great to clarify.


http://gerrit.cloudera.org:8080/#/c/20310/10//COMMIT_MSG@12
PS10, Line 12: consist
consisting


http://gerrit.cloudera.org:8080/#/c/20310/10//COMMIT_MSG@12
PS10, Line 12: 3 replication
             : factor
nit: replication factor of 3

or

RF=3


http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc
File src/kudu/master/auto_leader_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@20
PS10, Line 20: #include <cmath>
             : #include <cstddef>
nit: move these two lines between the <algorithm> and the <cstdint> included 
below


http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@233
PS10, Line 233: of a
per


http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@233
PS10, Line 233: leader
leader replicas


http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@236
PS10, Line 236:     int32_t expect_leader_count;
              :     if (remaining_tablets % remaining_tservers == 0) {
              :       expect_leader_count = remaining_tablets / 
remaining_tservers;
              :     } else {
              :       expect_leader_count = std::ceil(remaining_tablets /
              :                                       
static_cast<double_t>(remaining_tservers));
              :     }
Could this be replaced with

  const int target_leader_count = (remaining_tablets + remaining_tservers - 1) 
/ remaining_tservers;

?


http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@243
PS10, Line 243: expect_leader_count
nit: consider renaming to 'target_leader_count'


http://gerrit.cloudera.org:8080/#/c/20310/10/src/kudu/master/auto_leader_rebalancer.cc@248
PS10, Line 248:     if (remaining_tablets % remaining_tservers == 0) {
              :       remaining_tablets -= expect_leader_count;
              :     } else {
              :       remaining_tablets -= (should_transfer_count >= 0 ? 
expect_leader_count :
              :                              (expect_leader_count - 1));
              :     }
              :     remaining_tservers--;
I'm not sure this 'thinning' approach makes a lot of sense semantically.  As I 
understand, the number of tablet servers and the number of tablet replicas both 
stay constant during the whole process, only the density of _leader_ replicas 
needs to change by moving the leader replicas around.

I don't have a counter-example ready, but my intuition tells me that there 
might be some unexpected results if trying this 'thinning' here instead of just 
computing the deltas between the desired and the actual count of leader 
replicas at a particular tablet server.

What if we just keep the 'remaining_tablets' and 'remaining_tservers' constant, 
and continue to iterate though this cycle?  Wouldn't we get proper 'delta' 
count for every tablet server?

>From my observation, the original way of computing 'should_transfer_count' 
>simply wasn't semantically sound, but with proper computation of 
>'expect_leader_count' (or  'target_leader_count') as in PS10, the balancing of 
>leader replicas should become much better.



--
To view, visit http://gerrit.cloudera.org:8080/20310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f1fe796fd98da2d8764da793b7e254319e6348a
Gerrit-Change-Number: 20310
Gerrit-PatchSet: 10
Gerrit-Owner: Song Jiacheng <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Mon, 16 Oct 2023 18:51:07 +0000
Gerrit-HasComments: Yes

Reply via email to