Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
......................................................................


Patch Set 14:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

Line 93:   optional bool unsafe_config_change = 4;
> do we need to give this a default of false?
Done


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 616:     if (!new_config.unsafe_config_change()) {
> I think we need a default for this field or we first need to check has_unsa
added default, yeah agreed.


PS14, Line 1612: Peer uuid $0 is not found on original
> Peer with UUID $0 is not in the committed
Done


PS14, Line 1613: the node
> this replica
Done


Line 1617:     if (!IsRaftConfigVoter(peer_uuid, committed_config)) {
> I don't think this voter check is necessary. What we should do here is chec
Done, moved that check up in the API since the VOTER type can be deciphered 
much earlier looking at committed config.


PS14, Line 1653: This ensures that current node replicates the new config 
proposed,
> This makes this request appear to come from a new leader that the local rep
Done


PS14, Line 1654: it would step down before replicating
> this will cause it to step down
Done


PS14, Line 1654: node
> local replica
Done


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS14, Line 257: FLAGS_fault_crash_before_append_commit
> Don't reuse this unrelated fault flag for this test. That flag is for the W
Done


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

Line 132:   friend class RaftConsensus;
> Intentional?
yeah, because we wanted to grab TimeManager::SerialTimeStamp as per David's 
last comment, and he also suggested friending would be better than making that 
routine public.


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS14, Line 523: 1
> Magic number. Why 1? Also, what if the election is churny? Since we know a 
I took this suggestion and even found that is unreliable. i.e there is a window 
between election result(which finds the leader_ts) and leader committing the 
last opid. So querying GetLastOpId may not help. Besides 
WaitUntilCommittedOpIdIndexIs() internally invokes GetLastOpId to compare with 
the value we pass in as argument, so calling GetLastOpId before that is 
redundant. What we essentially need here was to 
WaitUntilAtleastCommittedOpId(1,...) so that we know at least the replica 
committed the opid we expect (still doesn't guarantee it is the leader though).


PS14, Line 786: 8 
> 9
Done


PS14, Line 828: 1
> Why 1? Same question as above and for the other tests.
I took this logic from an existing test in raft_consensus, so there may be some 
room for improvement here in the event of election churn, but I believe we can 
expect the leader to be stable in the absence of any external events in these 
itests ?


PS14, Line 847: >
> If we use >= 0 here then we can delete lines 857-859.
indeed :), good find, done.


PS14, Line 880: 11
> Why 11? Shouldn't it be post_commit opid index from line 828 plus 5 for # o
no, this includes commit of config changes driven by re-replications from 
master, so 4 new nodes get added bumping it to 11. But, I have removed this 
WaitUntil..() from all the tests at this stage since I am just counting the 
config replication # on a new node added by re-replication now at this stage. 
i.e. Wait at line 881 alone is sufficient.


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

PS14, Line 431: Enforce raft config on the tablet server with a new config
> Using the word enforce in this context sounds strange to me. How about:
Done, filled the latter under ExtraDescription.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to