Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/consensus.h
File src/kudu/consensus/consensus.h:

PS4, Line 255: ReplaceConfig
> there's only one implementation of consensus now, no need to do this.
Thanks, yeah, it should be pure virtual here.


http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS4, Line 1561: ReplaceConfig
> this needs a LOG WARNING in big bold letters at several stages.
done.


PS4, Line 1573:   // ReplaceConfig() replaces the config irrespective of 
opid_index value,
              :   // hence there is no meaning for cas_config_opid_index in 
this API.
> wouldn't this be a great way to make sure that you're changing the actual c
yeah, it could be, for now tool doesn't read what's there in the config, I 
should add that.


PS4, Line 1614:   // in the consensus request later.
              :   int64_t all_replicated_index = 
queue_->GetAllReplicatedIndex();
              :   OpId last_opid = queue_->GetLastOpIdInLog();
              :   uint64 msg_timestamp = time_manager_->GetSafeTime().value();
> I mentioned in a previous rev that these need to be obtained under a lock s
The term is obtained from ReplicaState @L1588 and these fields are grabbed from 
queue state and each of these member functions grab a spinlock from inside. As 
such I didn't see a reason for queue state snapshot to collate here. Or may be 
I haven't understood your comment fully.


PS4, Line 1675: state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER
> isn't it likely that the lonely node will think he's the leader?
Actually this whole block of code was removed now because of couple of reasons: 
a) This was only aborting the in-memory cmeta config changes pending and was 
not addressing what to do with the WAL entries. b) It became moot after we 
faked leader uuid from the tool; i.e., whether this node was a follower or a 
leader, the pending config changes get aborted as part of 
Update()->AbortOpsAfter() based on what is present on log and what's in 
consensus req. If this node was the leader, he would step down looking at 
higher term and abort.


http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

PS4, Line 160: NO_WAIT_FOR_LEADER
> nit: DONT_WAIT_FOR_LEADER
done


PS4, Line 166: WaitForReplicasReportedToMaster
> best to make this return a Status::TimeOut
Sounds good, tx


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to