Mike Percy has posted comments on this change. Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority replicas ......................................................................
Patch Set 7: (15 comments) I still need to look through the tests but here are some initial comments http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/consensus.proto File src/kudu/consensus/consensus.proto: PS7, Line 488: 4 You can just number this starting from 1 PS7, Line 492: sender_id How about caller_uuid or caller_id PS7, Line 497: required RaftPeerPB server Comment needs update? How about: required RaftConfigPB new_config? PS7, Line 506: new_config Is this used for something? Regardless, see my comment above, it seems to me that we should take this as input instead of provide it as output. http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/metadata.proto File src/kudu/consensus/metadata.proto: Line 93: optional bool allow_unsafe = 4; I think this belongs in ChangeConfigRecordPB http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1148: if (state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER) { This change can be reverted, right? Line 1598: string replica_uuid = peer_pb.permanent_uuid(); > warning: the variable 'replica_uuid' is copy-constructed from a const refer Based on my comments elsewhere I think this removal logic should probably happen on the client side, but you could use RemoveFromRaftConfig() for this Line 1630: new_config.add_peers()->CopyFrom(new_peer); So... we are only allowing changing to a single-node config? PS7, Line 1651: current_term + 1 please extract this into a new variable for use here and on line 1664 PS7, Line 1665: preceding_opid.index() + 1 Please extract a variable for this so it doesn't look like a magic number. You can define it next to the new term variable I mentioned on line 1651. PS7, Line 1667: msg_timestamp Don't we need this to be higher Line 1672: LOG_WITH_PREFIX(WARNING) I don't think it's necessary for all caps in the server log. If you want to print a warning, let's do it in the client tool. I do agree that we should log that we did a forced config change, though. http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc File src/kudu/consensus/raft_consensus_state.cc: PS7, Line 201: allow_unsafe I think this should be passed into this function as an enum flag, not part of the RaftConfigPB Line 238: if (!committed_config.allow_unsafe()) { This doesn't seem right to me. I think this check should only be skipped if the pending config is the one that was forced, since that implies we could commit an intermediate pending config that the latest pending config has forced an override of. Whether the committed config was forced or not doesn't factor into this, I does it? http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: Line 167: Status WaitForReplicasReportedToMaster( Doesn't ExternalMiniCluster::WaitForTabletServerCount() already implement this? -- To view, visit http://gerrit.cloudera.org:8080/6066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
