Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18679 )
Change subject: [master] support turning on/off auto rebalancer at runtime ...................................................................... Patch Set 26: (15 comments) http://gerrit.cloudera.org:8080/#/c/18679/26//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18679/26//COMMIT_MSG@16 PS26, Line 16: threadpool nit: thread pool http://gerrit.cloudera.org:8080/#/c/18679/26//COMMIT_MSG@16 PS26, Line 16: threadpool nit: thread pool http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer-test.cc@285 PS26, Line 285: // Make sure turn on/off auto-rebalancing is valid. How about: Make sure the auto-rebalancing can be toggled on/off in runtime. http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer-test.cc@287 PS26, Line 287: const int kNumMasters = 1; : const int kNumTservers = 3; : const int kNumTablets = 8; nit: these might be constexpr Also, since each of these constants are used only once in the code below, maybe get rid of them at all? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer-test.cc@303 PS26, Line 303: // Wait a schedule period. : SleepFor(MonoDelta::FromSeconds(1)); What's the scheduler's period at this point? It's better to use the actual value of the scheduling interval instead of hard-coded value since this might become flaky once the default value for the scheduler period changes. http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer-test.cc@307 PS26, Line 307: int num_iterations = NumLoopIterations(leader_idx); : ASSERT_EVENTUALLY([&] { : ASSERT_EQ(num_iterations, NumLoopIterations(leader_idx)); : }); Wouldn't the ASSERT_EQ() be almost always true since it's executed immediately after assigning the 'num_iterations' variable at line 307? Relying on the rare case when the scheduler runs right after the assignment and before ASSERT_EQ() evaluated is a bit tight -- it could happen that this assert fails only 1/1000 of all times it runs. http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.h File src/kudu/master/auto_rebalancer.h: http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.h@81 PS26, Line 81: Schedule `Rebalance()` Task asynchronously. As far as I can see, the task itself is being _scheduled_ synchronously, no? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.h@84 PS26, Line 84: the main loop This isn't a loop anymore, is it? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.h@85 PS26, Line 85: // When leader master and FLAGS_auto_rebalancing_enabled is : // true, invoke `DoReplicaRebalance()` to move replicas if : // replicas is not imbalanced. How about: Leader master invokes 'DoReplicaRebalance()' when FLAGS_auto_rebalancing_enabled is set to 'true'. http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.h@88 PS26, Line 88: Rebalance Since this class is a task on its own and named AutoRebalancerTask already, maybe rename this method into 'Run()'? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.h@90 PS26, Line 90: Replica moves if replicas is not imbalanced. ? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/auto_rebalancer.cc@194 PS26, Line 194: return Status::OK(); This looks a bit strange to me: there was a request to run the rebalancing and the method reports "OK, it's been scheduled", not nothing has happened in reality. Should this return Status::IllegalState() in such a case instead of Status::OK()? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/catalog_manager.cc@158 PS26, Line 158: 100 I don't think it makes sense to have the default scheduling interval this short given that --auto_rebalancing_interval_seconds has granularity of one second. Maybe, increase the default up to 1000 ms? http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/catalog_manager.cc@159 PS26, Line 159: every schedule_period_ms scheduler pool would check time and may run some tasks Every scheduler_period_ms the scheduler checks for tasks to run http://gerrit.cloudera.org:8080/#/c/18679/26/src/kudu/master/catalog_manager.cc@161 PS26, Line 161: max thread numbers of scheduler thread pool Maximum number of threads in the scheduler thread pool -- To view, visit http://gerrit.cloudera.org:8080/18679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbcea6392b783843f89e019f1141dd8905366de5 Gerrit-Change-Number: 18679 Gerrit-PatchSet: 26 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Fri, 05 Aug 2022 04:15:17 +0000 Gerrit-HasComments: Yes
