Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23755 )
Change subject: auto_rebalancer: enforce max_moves_per_server per tserver ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/23755/1/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/23755/1/src/kudu/master/auto_rebalancer.cc@20 PS1, Line 20: #include <atomic> nit: move this into the block of the STL includes below, keeping the block alphabetically sorted http://gerrit.cloudera.org:8080/#/c/23755/1/src/kudu/master/auto_rebalancer.cc@361 PS1, Line 361: // in-flight moves. We check against moves_per_tserver_ (the actual ongoing moves) : // rather than limiting within this batch, since the global max_moves limit : // already constrains the batch size. Is it possible to avoid copying, and instead of modifying the contents of the map below by using 'operator[]' use FindWithDefault() from map-util.h? http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@a411 PS3, Line 411: : I see this TODO is removed in PS3. Has this been addressed as of PS3 of this patch? http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@267 PS3, Line 267: } Do we want to add a few DCHECK() assertions on the state of the newly introduced moves_per_tserver_ container after exiting of this 'do ... while()' loop? I'd think of checking that all the values in the moves_per_tserver_ container are zero once all the moves are completed/failed, at least. http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@336 PS3, Line 336: FLAGS_auto_rebalancing_max_moves_per_server Since accessing FLAGS_auto_rebalancing_max_moves_per_server has extra costs, consider introducing a constant for this method's scope and using it here and elsewhere in this method instead of accessing FLAGS_auto_rebalancing_max_moves_per_server many times. It will also make this code more consistent and protect against unexpected surprises when the 'auto_rebalancing_max_moves_per_server' flag is modified in-flight in the middle of this code running (e.g., using the 'kudu tserver set_flag' CLI tool). http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@337 PS3, Line 337: max_moves -= replica_moves->size(); : if (max_moves <= 0) { : return Status::OK(); : } Please make sure 'num_tservers' and 'max_moves' end up being signed integers. Otherwise, the 'if()' condition might not be met as expected. I guess it might be a good idea to first compare max_moves against replica_moves->size() before subtraction, and returning Status::OK() if max_moves < replica_moves->size(). An alternative approach is changing 'auto' to ssize_t or 'int64_t' to be more explicit here. http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@367 PS3, Line 367: if (src_ongoing >= FLAGS_auto_rebalancing_max_moves_per_server || : dst_ongoing >= FLAGS_auto_rebalancing_max_moves_per_server) { I'd think of comparing (src_ongoing + dst_ongoing) against the limit set by the flag given its description and how it's being treated at lines 335-340 above. Otherwise, how can we limit the rebalancing activity to have only one active operation per tablet server, when the server is either source or destination? What do you think? http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@723 PS3, Line 723: if (moves_per_tserver_[src_ts_uuid] > 0) { Why would we have zeros there if the move that has been scheduled should have incremented the counter in the beginning? Should there be also be DCHECK_GT(moves_per_tserver_[src_ts_uuid], 0) in addition to this if() condition? If not, please add a comment to clarify why. -- To view, visit http://gerrit.cloudera.org:8080/23755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1b73045719dc69cc9e3353d318b7381f144a20d Gerrit-Change-Number: 23755 Gerrit-PatchSet: 3 Gerrit-Owner: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 06 Jan 2026 19:36:22 +0000 Gerrit-HasComments: Yes
