Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8679 )
Change subject: [quorum_util] update criteria for replica eviction ...................................................................... Patch Set 2: (17 comments) I think we should consider splitting this into two patches. The non-voter changes as outlined in the commit message are important and I think non-controversial. I am not sure about the voter eviction changes because a HEALTHY health report does not currently guarantee that a replica is not lagging, at least a little bit. I think we need to be more conservative about evicting voters right now and only evict a voter if we have # replication factory HEALTHY voters. http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc File src/kudu/consensus/quorum_util-test.cc: http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@41 PS2, Line 41: constexpr auto U = RaftPeerPB::UNKNOWN_MEMBER_TYPE; > warning: invalid case style for global constant 'U' [readability-identifier how about // NOLINT here and below? http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@45 PS2, Line 45: const auto health_statuses = { '?', '-', '+' }; > warning: invalid case style for global constant 'health_statuses' [readabil kHeatlhStatuses? http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@734 PS2, Line 734: EXPECT_EQ("C", to_evict) how about: EXPECT(to_evict == "C" || to_evict == "D") << to_evict; ? http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@745 PS2, Line 745: 3 magic number, can you pull this out into a constant? Such as const auto kReplicationFactor = 3; The other tests are so short that it's less of a concern but "3" appears a couple dozen times in this scenario so the logic would be easier to follow with replication factor as a constant. http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@773 PS2, Line 773: prior B is evicted prior to B getting evicted http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@774 PS2, Line 774: the failed voter who? http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util-test.cc@779 PS2, Line 779: ASSERT_EQ("B", to_evict); Why is "B" the correct choice instead of "D" here? aren't they equally good choices and this is implementation specific? Wouldn't this test become brittle if we changed the rules later? Perhaps ASSERT(to_evict == "B" || to_evict == "D") << to_evict; ? http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc File src/kudu/consensus/quorum_util.cc: http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@531 PS2, Line 531: which that http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@532 PS2, Line 532: : , while http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@533 PS2, Line 533: which whose http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@536 PS2, Line 536: in 'bad' with FAILED http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@537 PS2, Line 537: REPLACE 'replace' http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@539 PS2, Line 539: . , while replacing failed non-voters with healthy non-voters as aggressively as possible. http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@540 PS2, Line 540: when attempting an eviction, it's better to be certain about the ability : // to carry out the operation. we want to be sure that an eviction can succeed before initiating it. http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@545 PS2, Line 545: greater greater than http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@547 PS2, Line 547: bad failed http://gerrit.cloudera.org:8080/#/c/8679/2/src/kudu/consensus/quorum_util.cc@557 PS2, Line 557: (num_voters_healthy >= replication_factor && num_voters_total > replication_factor) || : (num_voters_total > replication_factor && !voter_failed.empty() && : num_voters_healthy >= MajoritySize(num_voters_total - 1)); nit: consider hoisting out one of the conditions, just for readability: num_voters_total > replication_factor && (num_voters_healthy >= replication_factor || (!voter_failed.empty() && num_voters_healthy >= MajoritySize(num_voters_total - 1))) -- To view, visit http://gerrit.cloudera.org:8080/8679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2062c2963fface43062b3154bf5ffa61b4fa684f Gerrit-Change-Number: 8679 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 05 Dec 2017 02:21:18 +0000 Gerrit-HasComments: Yes