Dinesh Bhat has posted comments on this change.

Change subject: [consensus] KUDU-1613: Fix replica eviction failure for 
WRONG_SERVER_UUID
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5111/5//COMMIT_MSG
Commit Message:

Line 9: Fixes replica eviction from leader where a follower returning
> Can you please explain your approach to fixing this issue in the commit mes
Done, could you see updated version would suffice ?


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 238:   switch (ps.response) {
> Maybe add a default case that calls LOG(FATAL) on the (unknown) response ty
Done


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

PS5, Line 140: respond with appropriate response status.
> Nit: technically, this method does not "respond" (in the sense of sending a
yeah, good catch. fixed.


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 816: // Verify that tablet copy is triggered when peer responds with
> Why do we want tablet copy to be triggered when the peer responds that it i
I copied this line-by-line from TestTriggerTabletCopyIfTabletNotFound above. In 
this case, WRONG_SERVER_UUID almost always means that for some reason tablet 
server went through reinitialization, which means  all the tablets become 
inaccessible due to uuid mismatch for all RPCs via CheckUuidMatchOrRespond 
after the reinitialization. In some way, this is the same situation as 
TABLET_NOT_FOUND but impact is more because it affected all replicas on that 
node. As for why tablet copy is necessary, I think it's probably more relevant 
when the tablet becomes under-replicated. I added this test really for that 
1-line change in consensus_queue.cc.


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2770
> Why was this comment removed? If it's no longer true, shouldn't the entire 
yeah, this line of code(and comment was supposed to be removed in next patch). 
I removed both comment+code in this patch itself initially, but added only code 
back realizing that this code is testing TABLET_NOT_RUNNING situation(forgot to 
add comment back). I think I am going to leave it without comments for now 
because my my next patch laid on top of this is removing both anyways, and it's 
a somewhat messy rebase situation otherwise.


PS5, Line 2782: // These tests exhibit the replica eviction behaviors when a 
follower
              : // returns WRONG_SERVER_UUID error code. Tests verify that 
followers
              : // returning these error codes are evicted from consensus after 
a
              : // specified --follower_unavailable_considered_failed_sec.
> Nit: this is talking about tests in the plural, but it's just a single test
done, yeah I think I forgot to change comments when I split one patch into 
multiple patches.


Line 2795:   int leader_indx = -1;
> Nit: conventionally, we use 'idx' as an infix for naming an index, not 'ind
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d3f84fd297a8e4760208a213c1ee393e92499a3
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to