Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )

Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc
File src/kudu/consensus/quorum_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@636
PS2, Line 636: TEST_P(QuorumUtilHealthPolicyParamTest, 
ShouldAddReplicaWhitelist) {
What about tests scenarios for ShouldRemoveReplica() when some of the tablet 
servers hosting replicas are put into the maintenance mode?


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@648
PS2, Line 648: unknown server
server with unknown health


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@689
PS2, Line 689: }
Does it make sense to add the following cases for replication factor 3:
  * A:? B:+ C:+, all voters, where A is in maintenance mode
  * A:- B:+ C:+, all voters, where A, B, and C in maintenance mode
  * A:- B:- C:+, all voters, where A and B in maintenance mode

Also, how does the maintenance mode affect configurations with non-voter 
replicas?  Imagine, some non-voter replica was in process of copying when a 
tablet server with the failed replica was marked/put into the maintenance.  
What will happen to the non-voter replica?  Will it be removed since the tablet 
server with the failed voter replica has just been put into the maintenance 
mode?


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@19
PS2, Line 19:
nit: remove the extra line


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@156
PS2, Line 156: TestMaintenanceModeDoesntRereplicate
> nit: TestMaintenanceModeWithoutRereplication?
Does it make sense to add a scenario to verify that it's possible to move 
replicas _from_ a tablet server put into the maintenance mode?


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@195
PS2, Line 195: // Bringing the tablet server down shouldn't lead to 
re-replication.
Another reasons for re-replication are:
  * follower replica falling behind leader's WAL GC threshold
  * replica failed to bootstrap due to IO error

Does it make sense to add test scenario for those or we assume that's already 
covered due to the way how the code is written in quorum_util.cc?  If the 
latter, then please add a comment to reflect those possible cases as well.


http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/catalog_manager.cc@4190
PS2, Line 4190:         unordered_set<string> whitelisted_uuids;
              :         
master_->ts_manager()->GetWhitelistedUuids(&whitelisted_uuids);
> It's wasteful to generate this for each tablet in the report.
+1



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 13 Sep 2019 07:47:45 +0000
Gerrit-HasComments: Yes

Reply via email to