Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23755 )

Change subject: auto_rebalancer: enforce max_moves_per_server per tserver
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/23755/1/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/23755/1/src/kudu/master/auto_rebalancer.cc@20
PS1, Line 20: #include <atomic>
nit: move this into the block of the STL includes below, keeping the block 
alphabetically sorted


http://gerrit.cloudera.org:8080/#/c/23755/1/src/kudu/master/auto_rebalancer.cc@361
PS1, Line 361:     // in-flight moves. We check against moves_per_tserver_ (the 
actual ongoing moves)
             :     // rather than limiting within this batch, since the global 
max_moves limit
             :     // already constrains the batch size.
Is it possible to avoid copying, and instead of modifying the contents of the 
map below by using 'operator[]' use FindWithDefault() from map-util.h?


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@a411
PS3, Line 411:
             :
I see this TODO is removed in PS3.  Has this been addressed as of PS3 of this 
patch?


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@267
PS3, Line 267: }
Do we want to add a few DCHECK() assertions on the state of the newly 
introduced moves_per_tserver_ container after exiting of this 'do ... while()' 
loop?

I'd think of checking that all the values in the moves_per_tserver_ container 
are zero once all the moves are completed/failed, at least.


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@336
PS3, Line 336: FLAGS_auto_rebalancing_max_moves_per_server
Since accessing FLAGS_auto_rebalancing_max_moves_per_server has extra costs, 
consider introducing a constant for this method's scope and using it here and 
elsewhere in this method instead of accessing 
FLAGS_auto_rebalancing_max_moves_per_server many times.

It will also make this code more consistent and protect against unexpected 
surprises when the 'auto_rebalancing_max_moves_per_server' flag is modified 
in-flight in the middle of this code running (e.g., using the 'kudu tserver 
set_flag' CLI tool).


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@337
PS3, Line 337:   max_moves -= replica_moves->size();
             :   if (max_moves <= 0) {
             :     return Status::OK();
             :   }
Please make sure 'num_tservers' and 'max_moves' end up being signed integers.  
Otherwise, the 'if()' condition might not be met as expected.

I guess it might be a good idea to first compare max_moves against 
replica_moves->size() before subtraction, and returning Status::OK() if 
max_moves < replica_moves->size().  An alternative approach is changing 'auto' 
to ssize_t or 'int64_t' to be more explicit here.


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@367
PS3, Line 367:     if (src_ongoing >= 
FLAGS_auto_rebalancing_max_moves_per_server ||
             :         dst_ongoing >= 
FLAGS_auto_rebalancing_max_moves_per_server) {
I'd think of comparing (src_ongoing + dst_ongoing) against the limit set by the 
flag given its description and how it's being treated at lines 335-340 above.  
Otherwise, how can we limit the rebalancing activity to have only one active 
operation per tablet server, when the server is either source or destination?

What do you think?


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@723
PS3, Line 723:     if (moves_per_tserver_[src_ts_uuid] > 0) {
Why would we have zeros there if the move that has been scheduled should have 
incremented the counter in the beginning?  Should there be also be 
DCHECK_GT(moves_per_tserver_[src_ts_uuid], 0) in addition to this if() 
condition?  If not, please add a comment to clarify why.



--
To view, visit http://gerrit.cloudera.org:8080/23755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1b73045719dc69cc9e3353d318b7381f144a20d
Gerrit-Change-Number: 23755
Gerrit-PatchSet: 3
Gerrit-Owner: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 06 Jan 2026 19:36:22 +0000
Gerrit-HasComments: Yes

Reply via email to