Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19608 )
Change subject: [master] Exclude tservers in MAINTENANCE_MODE when leader rebalancing ...................................................................... Patch Set 10: (8 comments) http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG@14 PS9, Line 14: For this reason, we should exclude : such tservers. > Having this patch is great, but it seems there is a race condition in this Yes. I want to use the 'quiesce state' of tserver to do this at next patch. But this status is a little different from 'Maintenance mode' Maintenance mode is a state of kudu-master, tserver don't know it is in 'Maintenance'. So a method extends the state to tserver. Tserver's quiesce state, I will study it again and then provide a solution to solve this problem. http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc File src/kudu/master/auto_leader_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@234 PS9, Line 234: : constexpr const int currentTserverIndex = 0; : tserver::MiniTabletServer* mini_tserver = cluster_->mini_tablet_server(currentTserverIndex); : // Sets the tserver state for a tserver to 'MAINTENANCE_MODE'. : ASSERT_OK( : master->ts_manager()->SetTServerState(mini_tserver->uuid(), : TServerStatePB::MAINTENANCE_MODE, : ChangeTServerStateRequestPB::ALLOW_MISSING_TSERVER, : master->catalog_mana > +1 Done http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@251 PS9, Line 251: // it's enough to reach > Wrap this into ASSERT_OK()? Done http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@253 PS9, Line 253: constexpr const int32_t retries = 20; > nit: please add a comment on the purpose of this pause // To make sure replica_rebalancer execute some runs and reach balanced. Because this test case tserver only 3, so replica is balanced, the SleepFor is not necessary, so remove it. http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@254 PS9, Line 254: { > Why 20? Why not 10 or 100? It's an estimated value. // Try to run 20 tries 'leader rebalance'. If mini_tserver not in MAINTENANCE_MODE, // it's enough to reach leader balanced, more tries is not necessary and less tries // may not reach leader rebalanced. http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@264 PS9, Line 264: tatus.IsIllegalState()) < > Does it make sense to check for exact Status code? And error message patte Done http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.h File src/kudu/master/auto_leader_rebalancer.h: http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.h@79 PS9, Line 79: std::unordered_set<st > Is it crucial to have the set of UUIDs to be ordered? If not, consider usi Done http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc File src/kudu/master/auto_leader_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc@405 PS9, Line 405: CatalogManager::ScopedLeaderSharedLock leader_lock(catalog_manager_); > What if one more tablet server is put into the maintenance mode just betwee Thanks for your advice. The problem exist indeed, unless we use a big lock ts_manager's SetTServerState when leader rebalancer runs. A big lock is not proper. So I strengthen the check condition and at most 1 tablet will be influenced when a new tserver entered MAINTENANCE_MODE in extreme situations . After this patch, a planning patch is for tserver quiesce state , which often used by decommission with enter maintenance mode, if a tserver in quiesce state receive a leader transferring request to it, it should reject the request. -- To view, visit http://gerrit.cloudera.org:8080/19608 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f85a675e69fd02a62e2625881dad2ca5e27acd9 Gerrit-Change-Number: 19608 Gerrit-PatchSet: 10 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: KeDeng <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Thu, 30 Mar 2023 08:28:53 +0000 Gerrit-HasComments: Yes
