Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/21073 )
Change subject: Remove the replace flag if the move fails in auto rebalancing. ...................................................................... Patch Set 14: (9 comments) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer-test.cc@823 PS12, Line 823: rebalanci > nit: rebalancing Done http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer-test.cc@855 PS12, Line 855: & > style nit for here and below: stick the ampersand to the type, not the vari Done http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.h File src/kudu/master/auto_rebalancer.h: http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.h@133 PS12, Line 133: // Finds replicas that are specified in 'replica_moves' and make requests : // to have them moved in order to rebalance the cluster. > Please update this in-line doc to reflect the semantics of the changes in t Done http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@141 PS12, Line 141: > Consider tagging this new flag with the 'unsafe' tag. Done http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@142 PS12, Line 142: DEFINE_bool(au > nit: wrong indent Done http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@698 PS12, Line 698: } : // If the move was completed, remove it from 'replica_moves'. > The message is now quite confusing. Please separate two logs: logging abou Done http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@724 PS12, Line 724: > nit: injected artificial test failure Done http://gerrit.cloudera.org:8080/#/c/21073/14/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/21073/14/src/kudu/master/auto_rebalancer.cc@669 PS14, Line 669: // Remove the replace flag of the replica. > Moving replica fail is a normal case. Run `kudu cluster rebalance` command There are two cases if the leader's replace flag is not removed: 1. Master add a replica to the raft config and the new replica always fails to become a voter because of some reasons, so it need to find another replica, and another, costing a lot resource. 2. Master add a replica to the raft config and the new replica finally become a voter, and the tablet always has a extra replica which is actually useless. The leader replica which fails to move is not always found in the next round of rebalancing. http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/rebalance/rebalancer.h File src/kudu/rebalance/rebalancer.h: http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/rebalance/rebalancer.h@158 PS12, Line 158: enum class RunStatus { : UNKNOWN, > Why to store this information at all? This smells not quite good: Yes, I considered about whether to store the information or get the consensus information when the move fails. And I select the former one to avoid more GetConsensus requests. And the number of RebalanceMove is usually small in our environment. I will try to do it in the other way. -- To view, visit http://gerrit.cloudera.org:8080/21073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I99dafa654878b9d6d8162d84500913ae0655692b Gerrit-Change-Number: 21073 Gerrit-PatchSet: 14 Gerrit-Owner: Song Jiacheng <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Song Jiacheng <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Comment-Date: Mon, 20 May 2024 08:31:53 +0000 Gerrit-HasComments: Yes
