Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18454 )
Change subject: [master] KUDU-3390 support auto rebalance tablet leaders across TServers ...................................................................... Patch Set 34: (27 comments) I am glad to receive lots of crs. Thank you very much. I have fixed them and you can review them again. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc File src/kudu/master/auto_leader_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@60 PS32, Line 60: (auto_rebalancing_interval_seconds); > Is this needed here? Remove it. And some DECLARE_* and METRIC_DECLARE* below also not used, they should be removed. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@62 PS32, Line 62: (auto_rebalancing_wait_for > And the same for several other declarations of gflags in this section: are Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@70 PS32, Line 70: class LeaderRebalancerTest : public KuduTest { : public: : Status CreateAndStartCluster() { > Are these needed in this file? Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@139 PS32, Line 139: rs; > How do we know that the distribution of the leader replicas is balanced? I Yes, the describe is not correct, I will fix it. when create tables, the table's tablet replicas and leaders is decided by algorithms of assign tablets. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@150 PS32, Line 150: > How do we know at this point that the cluster is leader-balanced? That's default status, leader not balanced. I fix the comments. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@168 PS32, Line 168: SleepFor(MonoDelta::FromMillisecond > That's a really long sleep time: that's not a good idea to have something l ok http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@169 PS32, Line 169: > I'm curious why is the number of retries so high? Is it possible to make t I set a small value for each iteration to cover more branches, eg: auto_leader_balancer.cc:296 http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer-test.cc@175 PS32, Line 175: const int kNumTServers = 4; > Why is it necessary to have a pause here? Would be great to add a comment I looks the sleep is not necessary, but I try to remove it the test is failed. I think master's tablets meta about roles, distributed is not realtime, it need wait heartbeat reports. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.h File src/kudu/master/auto_leader_rebalancer.h: http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.h@45 PS32, Line 45: // A CatalogManager background task which auto-rebalances tablets' leaders distribution. > nit: remove this period I have not understand this. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.h@46 PS32, Line 46: by transferring leadership > by transferring leadership between tablet replicas Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.h@95 PS32, Line 95: // send rpc messages for ’LeaderStepDown‘ > nit: please document this member as well -- what is it for? Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc File src/kudu/master/auto_leader_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@55 PS32, Line 55: using kudu::consensus::ConsensusServiceProxy; : using kudu::consensus::LeaderStepDownMode; : using kudu::consensus::LeaderStepDownRequestPB; : using kudu::consensus::LeaderStepDownResponsePB; : using kudu::consensus::RaftPeerPB; : using kudu::rpc::MessengerBuilder; : using kudu::rpc::RpcController; : using std::map; : using std::nullopt; : using std::string; : using std::vector; : using strings::Substitute; : : DEFINE_uint32(auto_leader_ > nit: remove empty lines and sort the rest of non-empty lines alphabetically Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@293 PS32, Line 293: > What's "synchronized rpc"? I mean the thread would wait the response synchronously. I fix the comments. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@293 PS32, Line 293: = 0; > very? Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@299 PS32, Line 299: break; > I don't think this sort of information is worth it having in the INFO log: Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@304 PS32, Line 304: reques > nit: could this be 'const string&' or 'const auto&' to avoid copying? Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@330 PS32, Line 330: > Drop the namespace prefix since there is corresponding 'using' using string Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/auto_leader_rebalancer.cc@337 PS32, Line 337: > What's VLOG(0), and why not to use VLOG(1) for this? The log can show information about the leader rebalance runs simply and directly. The log print only 1 record every period (default 1 hour), I think it's a necessary for INFO currently, In the future, I think we can add some metrics and remove the log. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@353 PS32, Line 353: automatic leader rebala > automatic leader rebalancing The style like line 346, whether it's necessary to change it ? I change it firstly. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1062 PS32, Line 1062: depend > depends Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1063 PS32, Line 1063: auto-rebalancing > auto-rebalancing Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1063 PS32, Line 1063: rebalanci > rebalancing Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1064 PS32, Line 1064: kudu > cannot Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1064 PS32, Line 1064: the effect of > What is 'load rebalance'? the effect of leader rebalancing, I fix the words. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1067 PS32, Line 1067: isabled, > ? I did not know "?", maybe I should rewrite the comments, I try to do this. http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1067 PS32, Line 1067: ven if a > because Done http://gerrit.cloudera.org:8080/#/c/18454/32/src/kudu/master/catalog_manager.cc@1067 PS32, Line 1067: ries to > followers? Done -- To view, visit http://gerrit.cloudera.org:8080/18454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibfb60d8759a93b6a19238637c27df4f6b1cac918 Gerrit-Change-Number: 18454 Gerrit-PatchSet: 34 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Fri, 07 Oct 2022 08:38:15 +0000 Gerrit-HasComments: Yes
