Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9625 )

Change subject: [consensus] FAILED_UNRECOVERABLE replica health status
......................................................................


Patch Set 4: Code-Review+1

(9 comments)

looks good, just minor stuff

http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc
File src/kudu/consensus/quorum_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1403
PS4, Line 1403: and
nit: and are


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1416
PS4, Line 1416: the
an


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1458
PS4, Line 1458: get
gets


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util-test.cc@1463
PS4, Line 1463: log
the log


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util.cc
File src/kudu/consensus/quorum_util.cc:

http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/consensus/quorum_util.cc@754
PS4, Line 754:   if (should_evict_voter &&
             :       has_voter_failed_unrecoverable && 
!has_non_voter_failed_unrecoverable) {
             :     // When we have the same type of failures between voters and 
non-voters
             :     // we evict non-voters first, but if there is an 
irreversibly failed voter,
             :     // then we are about to evict such a voter first because the 
transiently
             :     // failed non-voter might be back and in good shape a few 
moments. Also,
             :     // getting rid of a irreversibly failed voter may be 
beneficial in case
             :     // of even-number-of-voters configuration: the majority 
becomes more healthy
             :     // and more actionable.
             :     to_evict = pq_voters.top().first;
             :   } else {
             :     if (should_evict_non_voter) {
             :       CHECK(!pq_non_voters.empty());
             :       // First try to evict an excess non-voter, if present. 
This might help to
             :       // avoid IO load due to a tablet copy in progress.
             :       to_evict = pq_non_voters.top().first;
             :     } else if (should_evict_voter) {
             :       CHECK(!pq_voters.empty());
             :       // Next try to evict an excess voter.
             :       to_evict = pq_voters.top().first;
             :     }
             :   }
nit: would this be easier to read written like this?

  // Eviction priority order: (1) unrecoverable non_voters, (2) unrecoverable 
voters,
  // (3) evictable non_voters, (4) evictable voters.
  if (should_evict_non_voter && has_non_voter_failed_unrecoverable) {
    to_evict = pq_non_voters.top().first;
  } else if (should_evict_voter && has_voter_failed_unrecoverable) {
    to_evict = pq_voters.top().first;
  } else if (should_evict_non_voter) {
    to_evict = pq_non_voters.top().first;
  } else if (should_evict_voter) {
    to_evict = pq_voters.top().first;
  }


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1477
PS4, Line 1477: an
nit: in an


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1481
PS4, Line 1481:   
ASSERT_OK(WaitForReplicasReportedToMaster(cluster_->master_proxy(),
              :                                             kReplicasNum - 1,
              :                                             tablet_id_,
              :                                             kTimeout,
              :                                             WAIT_FOR_LEADER,
              :                                             VOTER_REPLICA,
              :                                             &has_leader,
              :                                             &tablet_locations));
Since the num_replicas argument is an exact match, I believe this is racy and 
could make this test flaky.


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1512
PS4, Line 1512: WAL
the WAL


http://gerrit.cloudera.org:8080/#/c/9625/4/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1514
PS4, Line 1514:   EXPECT_FALSE(IsRaftConfigMember(follower_uuid, 
cstate.committed_config()))
Correct me if I'm wrong, but I think this could be violated since we evict the 
unrecoverable voter before adding a non-voter replacement, so the same tablet 
server could be selected to host the replacement.



--
To view, visit http://gerrit.cloudera.org:8080/9625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35637c5bda6681b732dbc2bbf94b9d4258b12095
Gerrit-Change-Number: 9625
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Wed, 14 Mar 2018 21:14:25 +0000
Gerrit-HasComments: Yes

Reply via email to