Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19608 )

Change subject: [master] Exclude tservers in MAINTENANCE_MODE when leader 
rebalancing
......................................................................


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG@14
PS9, Line 14: For this reason, we should exclude
            : such tservers.
Having this patch is great, but it seems there is a race condition in this 
approach.

Shouldn't tablet replicas which are at tablet servers in the maintenance mode 
just refuse to become leaders?


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc
File src/kudu/master/auto_leader_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@234
PS9, Line 234: string exe_file;
             :   CHECK_OK(Env::Default()->GetExecutablePath(&exe_file));
             :   const string kudu_cli_path = DirName(exe_file);
             :
             :   // Make 1 tserver enter MAINTENANCE_MODE.
             :   ASSERT_OK(Subprocess::Call(Substitute("$0/kudu tserver state 
enter_maintenance $1 $2",
             :                                         kudu_cli_path,
             :                                         master_addresses,
             :                                         mini_tserver->uuid())));
> Is it possible to call TSManager::SetTServerState directly?
+1


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@251
PS9, Line 251: mini_tserver->Restart();
Wrap this into ASSERT_OK()?

Also, it would be great to add a comment on the reason behind restarting this 
tablet server.


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@253
PS9, Line 253:   SleepFor(MonoDelta::FromSeconds(10 * 
FLAGS_auto_rebalancing_interval_seconds));
nit: please add a comment on the purpose of this pause


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@254
PS9, Line 254: 20
Why 20?  Why not 10 or 100?


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer-test.cc@264
PS9, Line 264: CheckLeaderBalance().ok()
Does it make sense to check for exact Status code?  And error message pattern?


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.h
File src/kudu/master/auto_leader_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.h@79
PS9, Line 79: std::set<std::string>
Is it crucial to have the set of UUIDs to be ordered?  If not, consider using 
std::unordered_set because of faster lookup times if the set is large?


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc
File src/kudu/master/auto_leader_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc@405
PS9, Line 405:     RunLeaderRebalanceForTable(table_info, tserver_uuids, 
exclude_dest_uuids);
What if one more tablet server is put into the maintenance mode just between 
the list is built above in lines 391-397 and when this 
RunLeaderRebalanceForTable() is called?



--
To view, visit http://gerrit.cloudera.org:8080/19608
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f85a675e69fd02a62e2625881dad2ca5e27acd9
Gerrit-Change-Number: 19608
Gerrit-PatchSet: 9
Gerrit-Owner: Yuqi Du <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: KeDeng <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Tue, 28 Mar 2023 02:31:49 +0000
Gerrit-HasComments: Yes

Reply via email to