Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19608 )
Change subject: [master] Exclude tservers in MAINTENANCE_MODE when leader rebalancing ...................................................................... Patch Set 12: (10 comments) Thanks for your advices. All fixed. http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG@14 PS9, Line 14: For this reason, we should exclude : such tservers. > Right -- maintenance mode is just for masters to avoid placing replicas to Yes, this patch is just to avoid transferring leadership to replicas hosted at tablet servers in maintenance node. http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc File src/kudu/master/auto_leader_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@214 PS12, Line 214: const int kNumTServers = 3; > Since this test scenario involves many iterations and pauses, consider addi DONE. I am not sure about what test cases can be set SKIP_IF_SLOW_NOT_ALLOWED, I have added it at this test. Are there some standards or advices? In this case, I think it should run at DEBUG mode(DEBUG mode would skip slow cases), though it runs many iterations and has some 'SleepFor', in general, it runs fast enough, but this is my personal feeing. http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@215 PS12, Line 215: kNumTablets > It is possible to provide the essence of the this test scenario with less n Yes, you are right. In this case, it's not need a bigger number, a smaller one is ok. http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@244 PS12, Line 244: not > is not Done http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@244 PS12, Line 244: 20 tries 'leader rebalance' > 20 'leader rebalance' iterations Done http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@245 PS12, Line 245: // it's enough to reach leader balanced, more tries is not necessary and less tries : // may not reach leader rebalanced. > Does the number somehow depends on the number of tables in the cluster? If Done http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@249 PS12, Line 249: CHECK_OK > Why CHECK_OK() here instead of ASSERT_OK()? Why it's so important to crash Sorry, it should be ASSERT_OK http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@255 PS12, Line 255: a rebalanced state of leaders > the state of balanced leadership distribution Done http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@261 PS12, Line 261: be > have Done http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc File src/kudu/master/auto_leader_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc@405 PS9, Line 405: CatalogManager::ScopedLeaderSharedLock leader_lock(catalog_manager_); > Right -- having a provisional check to exclude tablet servers in maintenanc Yes, I have skimmed over the codes , currently it seems tservers in quiesce state may reject the transferring leader request. I'll make sure it, and if so, I think we can do nothing about this or just patching an unit test if needed. -- To view, visit http://gerrit.cloudera.org:8080/19608 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f85a675e69fd02a62e2625881dad2ca5e27acd9 Gerrit-Change-Number: 19608 Gerrit-PatchSet: 12 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: KeDeng <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Thu, 06 Apr 2023 04:11:30 +0000 Gerrit-HasComments: Yes
