Dinesh Bhat has posted comments on this change. Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority replicas ......................................................................
Patch Set 14: (16 comments) http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/metadata.proto File src/kudu/consensus/metadata.proto: Line 93: optional bool unsafe_config_change = 4; > do we need to give this a default of false? Done http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 616: if (!new_config.unsafe_config_change()) { > I think we need a default for this field or we first need to check has_unsa added default, yeah agreed. PS14, Line 1612: Peer uuid $0 is not found on original > Peer with UUID $0 is not in the committed Done PS14, Line 1613: the node > this replica Done Line 1617: if (!IsRaftConfigVoter(peer_uuid, committed_config)) { > I don't think this voter check is necessary. What we should do here is chec Done, moved that check up in the API since the VOTER type can be deciphered much earlier looking at committed config. PS14, Line 1653: This ensures that current node replicates the new config proposed, > This makes this request appear to come from a new leader that the local rep Done PS14, Line 1654: it would step down before replicating > this will cause it to step down Done PS14, Line 1654: node > local replica Done http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus_state.cc File src/kudu/consensus/raft_consensus_state.cc: PS14, Line 257: FLAGS_fault_crash_before_append_commit > Don't reuse this unrelated fault flag for this test. That flag is for the W Done http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h File src/kudu/consensus/time_manager.h: Line 132: friend class RaftConsensus; > Intentional? yeah, because we wanted to grab TimeManager::SerialTimeStamp as per David's last comment, and he also suggested friending would be better than making that routine public. http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS14, Line 523: 1 > Magic number. Why 1? Also, what if the election is churny? Since we know a I took this suggestion and even found that is unreliable. i.e there is a window between election result(which finds the leader_ts) and leader committing the last opid. So querying GetLastOpId may not help. Besides WaitUntilCommittedOpIdIndexIs() internally invokes GetLastOpId to compare with the value we pass in as argument, so calling GetLastOpId before that is redundant. What we essentially need here was to WaitUntilAtleastCommittedOpId(1,...) so that we know at least the replica committed the opid we expect (still doesn't guarantee it is the leader though). PS14, Line 786: 8 > 9 Done PS14, Line 828: 1 > Why 1? Same question as above and for the other tests. I took this logic from an existing test in raft_consensus, so there may be some room for improvement here in the event of election churn, but I believe we can expect the leader to be stable in the absence of any external events in these itests ? PS14, Line 847: > > If we use >= 0 here then we can delete lines 857-859. indeed :), good find, done. PS14, Line 880: 11 > Why 11? Shouldn't it be post_commit opid index from line 828 plus 5 for # o no, this includes commit of config changes driven by re-replications from master, so 4 new nodes get added bumping it to 11. But, I have removed this WaitUntil..() from all the tests at this stage since I am just counting the config replication # on a new node added by re-replication now at this stage. i.e. Wait at line 881 alone is sufficient. http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: PS14, Line 431: Enforce raft config on the tablet server with a new config > Using the word enforce in this context sounds strange to me. How about: Done, filled the latter under ExtraDescription. -- To view, visit http://gerrit.cloudera.org:8080/6066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
