Dinesh Bhat has posted comments on this change. Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority replicas ......................................................................
Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/consensus.h File src/kudu/consensus/consensus.h: PS4, Line 255: ReplaceConfig > there's only one implementation of consensus now, no need to do this. Thanks, yeah, it should be pure virtual here. http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS4, Line 1561: ReplaceConfig > this needs a LOG WARNING in big bold letters at several stages. done. PS4, Line 1573: // ReplaceConfig() replaces the config irrespective of opid_index value, : // hence there is no meaning for cas_config_opid_index in this API. > wouldn't this be a great way to make sure that you're changing the actual c yeah, it could be, for now tool doesn't read what's there in the config, I should add that. PS4, Line 1614: // in the consensus request later. : int64_t all_replicated_index = queue_->GetAllReplicatedIndex(); : OpId last_opid = queue_->GetLastOpIdInLog(); : uint64 msg_timestamp = time_manager_->GetSafeTime().value(); > I mentioned in a previous rev that these need to be obtained under a lock s The term is obtained from ReplicaState @L1588 and these fields are grabbed from queue state and each of these member functions grab a spinlock from inside. As such I didn't see a reason for queue state snapshot to collate here. Or may be I haven't understood your comment fully. PS4, Line 1675: state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER > isn't it likely that the lonely node will think he's the leader? Actually this whole block of code was removed now because of couple of reasons: a) This was only aborting the in-memory cmeta config changes pending and was not addressing what to do with the WAL entries. b) It became moot after we faked leader uuid from the tool; i.e., whether this node was a follower or a leader, the pending config changes get aborted as part of Update()->AbortOpsAfter() based on what is present on log and what's in consensus req. If this node was the leader, he would step down looking at higher term and abort. http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: PS4, Line 160: NO_WAIT_FOR_LEADER > nit: DONT_WAIT_FOR_LEADER done PS4, Line 166: WaitForReplicasReportedToMaster > best to make this return a Status::TimeOut Sounds good, tx -- To view, visit http://gerrit.cloudera.org:8080/6066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
