Gabriella Lotz has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23755 )

Change subject: auto_rebalancer: enforce max_moves_per_server per tserver
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@a411
PS3, Line 411:
             :
> I see this TODO is removed in PS3.  Has this been addressed as of PS3 of th
No, this TODO has not been addressed, it was my error deleting it. I've added 
it back.


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@267
PS3, Line 267:     } while (!replica_moves.empty());
> Do we want to add a few DCHECK() assertions on the state of the newly intro
Done


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@336
PS3, Line 336:
> Since accessing FLAGS_auto_rebalancing_max_moves_per_server has extra costs
Done


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@337
PS3, Line 337: Status AutoRebalancerTask::GetMovesUsingRebalancingAlgo(
             :   const ClusterRawInfo& raw_info,
             :   rebalance::RebalancingAlgo* algo,
             :   C
> Please make sure 'num_tservers' and 'max_moves' end up being signed integer
Done


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@367
PS3, Line 367:   RETURN_NOT_OK(algo->GetNextMoves(cluster_info, max_moves, 
&moves));
             :
> I'd think of comparing (src_ongoing + dst_ongoing) against the limit set by
Could you clarify what you mean by comparing (src_ongoing + dst_ongoing) 
against the limit? My understanding is that src_ongoing is the count for the 
source tserver and dst_ongoing is for the destination tserver. Adding them 
would give the combined count across both servers, which doesn't match the flag 
semantics of limiting each individual server.

Or are you suggesting we should track the total operations on a server (as both 
source and destination combined)? If so, I think that's already what 
moves_per_tserver_ does, it increments for both source and destination when a 
move is scheduled.


http://gerrit.cloudera.org:8080/#/c/23755/3/src/kudu/master/auto_rebalancer.cc@723
PS3, Line 723:       replica_moves->erase(replica_moves->begi
> Why would we have zeros there if the move that has been scheduled should ha
The counter can legitimately be zero if ExecuteMoves() failed before processing 
this move. ExecuteMoves() uses RETURN_NOT_OK which exits early on error, 
leaving later moves in replica_moves with zero counters. The if() check handles 
this case. I've added a comment to clarify.

An alternative would be refactoring ExecuteMoves() to remove failed moves from 
replica_moves, but that's a larger change I'd prefer to defer to keep this 
patch focused.



--
To view, visit http://gerrit.cloudera.org:8080/23755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1b73045719dc69cc9e3353d318b7381f144a20d
Gerrit-Change-Number: 23755
Gerrit-PatchSet: 4
Gerrit-Owner: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 09 Jan 2026 21:55:35 +0000
Gerrit-HasComments: Yes

Reply via email to