Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14222 )
Change subject: KUDU-2069 p4: stop replication from failed servers in maintenance mode ...................................................................... Patch Set 3: (20 comments) http://gerrit.cloudera.org:8080/#/c/14222/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14222/2//COMMIT_MSG@10 PS2, Line 10: llowed to be in a "ba > nit: maybe 'can be in a bad state (e.g. due to in maintenance mode)'? To ex Done Also worth noting that maintenance mode is not itself a bad state. It is protection against being considered a bad state. http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc File src/kudu/consensus/quorum_util-test.cc: http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@636 PS2, Line 636: EXPECT_FALSE(ShouldAddReplica(config, 5, { "A", "B" })); > What about tests scenarios for ShouldRemoveReplica() when some of the table This is somewhat white-box knowledge, but the "whitelist" isn't even part of the `ShouldAddReplica()` API, so I'm not sure testing it here makes sense. http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@648 PS2, Line 648: ig, "B", V, '+ > server with unknown health Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@658 PS2, Line 658: // should honor the replacement and try to add > nit: should this comment be moved to L664? Ack http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util-test.cc@689 PS2, Line 689: EXPECT_FALSE(ShouldAddReplica(config, 5, { "A", "B" })); > Does it make sense to add the following cases for replication factor 3: Sure, I'll add those cases. Added some coverage for non-voters. http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.h File src/kudu/consensus/quorum_util.h: http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.h@122 PS2, Line 122: set the UUID of the best candidate > there is a quorum Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.h@132 PS2, Line 132: > nit: would '{}' suffice here instead of 'std::unordered_set<std::string>()' I get an error when I try that: "chosen constructor is explicit in copy-initialization". http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.cc File src/kudu/consensus/quorum_util.cc: http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.cc@456 PS2, Line 456: ) << Substitute("ignor > Ah, just another observation. Probably, I should have mentioned that earli I updated my design doc with some notes on this: Option 1: the rebalancing tool and the move tool will continue to work even during maintenance mode REPLACE replicas _will_ continue to count towards underreplication when deciding to add replicas, even when in maintenance mode. * There would be no changes to determining whether we should evict. In this way, maintenance mode does not affect the move tool at all. * The downside here is that it may mean replication traffic. Using maintenance mode is generally a signal that something is wrong or needs to be fixed in a cluster/tserver. It thus behooves us to make sure there are fewer moving parts where we can. Option 2: the rebalancing tool and the move tool will set REPLACE and PROMOTE, but those will not be honored by the master -- the master err on the side of not evicting and not adding replicas. * When determining whether to add replicas, replicas marked REPLACE will not count towards being underreplicated, while those marked PROMOTE will count if healthy. * When determining whether to remove replicas, replicas marked REPLACE will be prioritized as healthy (or even lower, so we truly never evict nodes in maintenance mode) * The major downside here is that while the goal itself is clear, the exact semantics aren’t quite clear to articulate, especially when considering whether to evict (see ShouldEvictReplica() in quorum_util.cc) Verdict: There is a clearer implementation path towards Option 1, and the semantics seem less surprising, when considering that operators may run tools manually. If we really want to avoid rebalancing during maintenance mode, it doesn’t seem unreasonable to implement checks in the rebalancer to check for maintenance mode. http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc File src/kudu/integration-tests/maintenance_mode-itest.cc: http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@19 PS2, Line 19: #include <memory> > nit: remove the extra line Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@87 PS2, Line 87: pts.num_ > nit: 'Perform' to align with the rest. Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@99 PS2, Line 99: TServerStateChangePB* state_change = req.mutable_change(); > This is a proxy for the question "did we rereplicate anything", right? Woul Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@151 PS2, Line 151: for (int p = 0; p < committed_config.peers_size(); p++) { > Don't need this? Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@156 PS2, Line 156: > Does it make sense to add a scenario to verify that it's possible to move r Changed to a different name. http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@156 PS2, Line 156: > To clarify, my comment was about the manual movement of replicas from the n Ack http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@195 PS2, Line 195: NO_FATALS(SetUpCluster(3)); > Another reasons for re-replication are: Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@208 PS2, Line 208: SKIP_IF_SLOW_NOT_ALLOWED(); > Nit: don't need to initialize this. Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@229 PS2, Line 229: st string& maintenance_uuid = maintenance_ts->uuid(); : ASSERT_OK(ChangeTServerState( > Should we add a test that re-replication can still happen to other tservers Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/catalog_manager.cc@4190 PS2, Line 4190: // health, we ignore reports from non-leaders in this case. Also, making : // the changes recommended by Should{Add,Evict}Replica() assumes > +1 Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h File src/kudu/master/ts_manager.h: http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h@92 PS2, Line 92: ement(TSDescriptorVector* descs) const; : > nit: can you elaborate a bit in which cases this should happen. Done http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h@94 PS2, Line 94: // Return any tablet servers UUIDs that can be in a failed state without > If O(num tservers in maintenance mode) is relatively high, GetWhitelistedUu We chatted a bit offline about this and the locking argument goes away given the changes in part 1 of this patch set (now, it's a single lock acquisition to generate the ignored set). I updated the name to "UuidsToIgnoreForUnderreplication", in an attempt to decouple consensus code from maintenance mode. -- To view, visit http://gerrit.cloudera.org:8080/14222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db Gerrit-Change-Number: 14222 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 25 Sep 2019 23:59:50 +0000 Gerrit-HasComments: Yes
