Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9625 )
Change subject: [consensus] FAILED_UNRECOVERABLE replica health status ...................................................................... Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc File src/kudu/consensus/quorum_util-test.cc: http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1403 PS4, Line 1403: and > nit: and are Done http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1416 PS4, Line 1416: the > an Done http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1458 PS4, Line 1458: get > gets Done http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1463 PS4, Line 1463: log > the log Done http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util.cc File src/kudu/consensus/quorum_util.cc: http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util.cc@754 PS4, Line 754: if (should_evict_voter && : has_voter_failed_unrecoverable && !has_non_voter_failed_unrecoverable) { : // When we have the same type of failures between voters and non-voters : // we evict non-voters first, but if there is an irreversibly failed voter, : // then we are about to evict such a voter first because the transiently : // failed non-voter might be back and in good shape a few moments. Also, : // getting rid of a irreversibly failed voter may be beneficial in case : // of even-number-of-voters configuration: the majority becomes more healthy : // and more actionable. : to_evict = pq_voters.top().first; : } else { : if (should_evict_non_voter) { : CHECK(!pq_non_voters.empty()); : // First try to evict an excess non-voter, if present. This might help to : // avoid IO load due to a tablet copy in progress. : to_evict = pq_non_voters.top().first; : } else if (should_evict_voter) { : CHECK(!pq_voters.empty()); : // Next try to evict an excess voter. : to_evict = pq_voters.top().first; : } : } > nit: would this be easier to read written like this? Indeed, that's much clearer. Thanks! http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc: http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1477 PS4, Line 1477: an > nit: in an Done http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1481 PS4, Line 1481: ASSERT_OK(WaitForReplicasReportedToMaster(cluster_->master_proxy(), : kReplicasNum - 1, : tablet_id_, : kTimeout, : WAIT_FOR_LEADER, : VOTER_REPLICA, : &has_leader, : &tablet_locations)); > Since the num_replicas argument is an exact match, I believe this is racy a It's a good observation -- I took care of that. http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1512 PS4, Line 1512: WAL > the WAL Done http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1514 PS4, Line 1514: EXPECT_FALSE(IsRaftConfigMember(follower_uuid, cstate.committed_config())) > Correct me if I'm wrong, but I think this could be violated since we evict It's a good observation -- I took care of that. -- To view, visit http://gerrit.cloudera.org:8080/9625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095 Gerrit-Change-Number: 9625 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 14 Mar 2018 22:27:43 +0000 Gerrit-HasComments: Yes
