Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18454 )
Change subject: [master] KUDU-3390 support auto rebalance tablet leaders across TServers ...................................................................... Patch Set 30: (5 comments) http://gerrit.cloudera.org:8080/#/c/18454/30//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18454/30//COMMIT_MSG@20 PS30, Line 20: which is hidden variables Maybe you need to explain what leads to an unbalanced load, I really don't know what this means. http://gerrit.cloudera.org:8080/#/c/18454/30/src/kudu/master/auto_leader_rebalancer.cc File src/kudu/master/auto_leader_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/18454/30/src/kudu/master/auto_leader_rebalancer.cc@287 PS30, Line 287: // if the num of tasks is vary large, synchronized rpc will be slow, : // causing the thread cost much time to do the job. Besides, if the leadership of many tablets is changed, clients will send more GetTabletLocations requests to the leader master, and the master will be overloaded. Maybe we can introduce a flag similar to 'auto_rebalancing_max_moves_per_server'. http://gerrit.cloudera.org:8080/#/c/18454/30/src/kudu/master/auto_leader_rebalancer.cc@350 PS30, Line 350: if (e->PresumedDead()) { If there are some temporarily unavailable tservers in a cluster, is it a good idea to just ignore them in this leader rebalancing task? What if they can recover in a short time? http://gerrit.cloudera.org:8080/#/c/18454/30/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/18454/30/src/kudu/master/catalog_manager.cc@352 PS30, Line 352: auto_leader_rebalancing_enabled This flag should be tagged with 'runtime' too. http://gerrit.cloudera.org:8080/#/c/18454/30/src/kudu/master/catalog_manager.cc@1061 PS30, Line 1061: // Leader rebalancer depend on a good replicas topology, that means we'd better enable : // auto_rebalancing, but when auto_rebalancing is disabled and leader rebalance is enabled, : // that is ok, we support it. : I think it is worth pointing out the difference between rebalancing tablet leaders based on a balanced cluster and an unbalanced cluster. Have you also tried starting leader rebalancing tasks only on an extremely unbalanced cluster? Would that make things worse? -- To view, visit http://gerrit.cloudera.org:8080/18454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibfb60d8759a93b6a19238637c27df4f6b1cac918 Gerrit-Change-Number: 18454 Gerrit-PatchSet: 30 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Mon, 26 Sep 2022 13:45:48 +0000 Gerrit-HasComments: Yes
