Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
......................................................................


Patch Set 16:

(34 comments)

I did one quick nit pass for everything except tests. I caught a couple minor 
issues, but this rev is mostly asking for changes to comments and log messages. 
Almost there.

http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 494:   // The raft config sent to destination server.
Add: Only the 'permanent_uuid' of each peer in the config is required 
(address-related information is ignored by the server). The peers specified in 
'new_config' are required to be a subset (or equal to) the peers in the 
committed config on the destination replica.


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS16, Line 553: :
nit: add space after colon


PS16, Line 554: :
and here


PS16, Line 777: :
nit: space after colon here and elsewhere in this log message


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS16, Line 1167: :
space after colon here and in the rest of this commit message


PS16, Line 1330:  
nit: Add a period before this space.


PS16, Line 1594: Node
Replica


PS16, Line 1594: so new config will be appended.
but the new config will be unsafely changed anyway.


PS16, Line 1595:  :
": "


Line 1596:             << state_->GetPendingConfigUnlocked().DebugString();
Add:
    << ", New config: " << SecureShortDebugString(req.new_config());


PS16, Line 1596: ebugString
use SecureShortDebugString() instead


PS16, Line 1605:  and that node is a VOTER in the config
remove this


PS16, Line 1606: This is to prevent the API allowing an invalid uuid into the 
new config.
Consider changing this to: This allows a manual recovery tool to only have to 
specify the uuid of each replica in the new config without having to know the 
addresses of each server (since we can get the address information from the 
committed config). Additionally, only a subset of the committed config is 
required for typical cluster repair scenarios.


PS16, Line 1616: for tablet $1
for tablet $1. Committed config: $2

...and include the committed config in this message using ShortDebugString()


PS16, Line 1630: we don't want to enforce unsafe config that way
Let's replace this part of the comment with: "it is rare and a replica without 
itself in the latest config is definitely not caught up with the latest 
leader's log."


PS16, Line 1633: committed
new


PS16, Line 1635: tablet $1
tablet $1. Rejected config: $2

and pass in SecureShortDebugString(new_config)


PS16, Line 1639: preceding_opid.index() + 1
Define replicate_opid_index up here and use it here as well as down below.


PS16, Line 1660: leader_term
let's rename this variable 'new_term'


PS16, Line 1680: DVLOG(1) << "Consensus request: "
VLOG_WITH_PREFIX(3) << "UnsafeChangeConfig: Generated consensus request: "


Line 1681:   DVLOG(1) << "Replicate Msg: " << 
SecureShortDebugString(*replicate);
Remove this line since it's redundant with the line above


PS16, Line 1684: ,
nit: add space after comma


PS16, Line 1685: DebugString()
SecureShortDebugString() here and below


PS16, Line 1685:  :
space after colon, not before


PS16, Line 1686:  :
here also: space after colon, not before


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS16, Line 241: (
nit: Remove extra pair of parentheses here


PS16, Line 245:       << Substitute("New committed config must equal pending 
config, but does not. "
              :           "Pending config: $0, committed config: $1",
              :           SecureShortDebugString(pending_config),
              :           SecureShortDebugString(config_to_commit));
nit: it seems like the indentation is a little messed up on these 4 lines now. 
Can you re-indent it and fix it to look like it did before?


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 45: #include "kudu/tools/tool_test_util.h"
Is this used? It looks like a test utility library?


PS16, Line 366:   LOG(WARNING) << "NOTE: the new config may be replicated 
asynchronously "
              :                << "to other peers and it may take a while to 
bring the tablet "
              :                << "back to majority replica count depending 
upon how much time "
              :                << "the new peer replicas take to catch up with 
the tablet server "
              :                << "enforcing the new config.";
I think we should remove this message. I think it is confusing because it talks 
about replication when this tool is a single-node tool. Instead of messages 
like this, we should just provide good documentation on the web site for people 
doing cluster troubleshooting that need a manual repair tool.


PS16, Line 378: MonoDelta::FromSeconds(30)
MonoDelta::FromMilliseconds(FLAGS_timeout_ms)


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS16, Line 885: DVLOG(1) 
how about: VLOG(2)


PS16, Line 909: DVLOG(1)
how about: VLOG(2)


PS16, Line 910: ChangeConfig
UnsafeChangeConfig


PS16, Line 923: s
Status s


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to