Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19608 )

Change subject: [master] Exclude tservers in MAINTENANCE_MODE when leader 
rebalancing
......................................................................


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19608/9//COMMIT_MSG@14
PS9, Line 14: For this reason, we should exclude
            : such tservers.
> Yes.
Right -- maintenance mode is just for masters to avoid placing replicas to 
corresponding tablet servers.

I guess if the idea is just to avoid transferring leadership to replicas hosted 
at tablet servers in maintenance node, this patch should be good enough then.

Thanks!


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc
File src/kudu/master/auto_leader_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@214
PS12, Line 214:   const int kNumTServers = 3;
Since this test scenario involves many iterations and pauses, consider adding 
SKIP_IF_SLOW_NOT_ALLOWED() in the very beginning of this test?


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@215
PS12, Line 215: kNumTablets
It is possible to provide the essence of the this test scenario with less 
number of tablets in the test cluster?  I guess the number of iterations/tries 
to run leader rebalancing depends on this?  If yes, then maybe use a smaller 
number for faster run-time and less computing resources spent during test 
cycles?


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@244
PS12, Line 244: 20 tries 'leader rebalance'
20 'leader rebalance' iterations


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@244
PS12, Line 244: not
is not


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@245
PS12, Line 245:   // it's enough to reach leader balanced, more tries is not 
necessary and less tries
              :   // may not reach leader rebalanced.
Does the number somehow depends on the number of tables in the cluster?  If so, 
maybe make that constant to be derived from kNumTablets above?


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@249
PS12, Line 249: CHECK_OK
Why CHECK_OK() here instead of ASSERT_OK()?  Why it's so important to crash 
with due to SIGABRT instead of just failing the test if something goes wrong?


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@255
PS12, Line 255: a rebalanced state of leaders
the state of balanced leadership distribution


http://gerrit.cloudera.org:8080/#/c/19608/12/src/kudu/master/auto_leader_rebalancer-test.cc@261
PS12, Line 261: be
have


http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc
File src/kudu/master/auto_leader_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/19608/9/src/kudu/master/auto_leader_rebalancer.cc@405
PS9, Line 405:     CatalogManager::ScopedLeaderSharedLock 
leader_lock(catalog_manager_);
> Thanks for your advice. The problem exist indeed, unless we use a big lock
Right -- having a provisional check to exclude tablet servers in maintenance 
mode should be good enough.  Anyways, even if leader replicas land in a tablet 
server in the middle when it's being put in the maintenance mode, it should not 
be a big deal -- Raft takes care of re-elections once the tablet server is 
down, and existing Kudu client libraries automatically retry write operations 
on that case.

I guess all tablet replicas at quesceing tablet servers simply rejects requests 
to transfer leadership to them, and it makes sense to see how the leader 
rebalancing works in that case -- maybe, some updates are needed.

Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/19608
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f85a675e69fd02a62e2625881dad2ca5e27acd9
Gerrit-Change-Number: 19608
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: KeDeng <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Tue, 04 Apr 2023 21:00:35 +0000
Gerrit-HasComments: Yes

Reply via email to