[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

Dinesh Bhat (Code Review) Mon, 20 Mar 2017 15:43:50 -0700

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
......................................................................



Patch Set 11:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/6066/11//COMMIT_MSG
Commit Message:

Line 41: 
> A description of the test cases you have implemented would be nice.
Done


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 495:   required RaftConfigPB new_config = 4;
> On second thought, maybe we can defer that since right now we only support 
agreed, not changing for now.


PS11, Line 499: otherwise 'new_configuration' is set
> Update comment
Done


PS11, Line 522: Unsafe
> nit: Can we change this to Force instead of Unsafe?
I initially started with that naming convention, but changed it to Unsafe only 
because the ForceChangeConfig seemed bit vague to me since the actual 
ChangeConfig itself is a forced operation from a follower point of view. 
However, if we want to differentiate between automatic and manual, we could go 
with ManualChangeConfig perhaps ?


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

PS11, Line 93: allow_unsafe
> How about forced_config_change?
this can go along with the naming convention for API, since I was suggesting 
'manual' earlier, perhaps manual_config_change ?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS7, Line 1667: pid_index);
> You are using safetime which is in the past. Don't we want the timestamp to
Lemme poke David to be sure, I am not sure if it makes much difference. One of 
the point David had was perhaps we shouldn't let the API to be exercised too 
soon by the user. i.e., we can have a check somewhere which says consensus on 
this node was stuck for say more than 10 minutes at the least to allow room for 
automatic recovery.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1592:   // Check that passed replica uuids are part of the committed 
config on this node.
> Note: You can use the helper function IsRaftConfigVoter() from quorum_util.
Done, I didn't use that routine because we already can compare directly its 
member type here. Also I am explicitly not comparing with specific membership 
type, because thats something which can be enforced by the tool. This also 
means we could let the user specify membership type on the CLI.


Line 1601:           new_peer.last_known_addr().host() == 
committed_peer.last_known_addr().host() &&
> Do we expect last_known_addr to always be passed in the new config? I thoug
yeah we do. If you can recall, there was a reason why we chose RaftConfigPB as 
part of RPC, reason being in the case of 5-replica original config {ABCDE} and 
2 nodes are part of the enforced config {AB} being written on node A, we want 
to be able to provide both A and B's HostPort endpoints as part of config. The 
tool could query HostPort details from master, but tool wouldn't know whether 
master has a stale info or recent. So for now, pushing back on the user to 
provide the complete info instead of this API or the tool determining that info 
at run time.


Line 1623:   RaftConfigPB new_config = config;
> How about just use a copy of the committed config, but with the uuids we wa
That's a good idea, but given that we are supplying the entire config to the 
API this is not different than the current code wrt end result. Did you have 
any other concern in mind ?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

Line 238:   // In the event of unsafe config change enforced from an external 
tool,
> I don't understand that logic. If there are multiple "pending" configs, whi
I will move this discussion to latest patch since this code/comments have 
changed a bit there, and I see that you have added sine comments there as well.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS11, Line 212: Allowing config update
> How about: Allowing forced config change
Done


PS11, Line 239: may not have a pending config
> You mean that it may not match the currently-pending config, right?
thats correct, thanks for catching, updated.


PS11, Line 242: both
> I don't think this will work in all cases; see below
update comment here.


PS11, Line 247: !config_to_commit.allow_unsafe()
> As mentioned in the rev 7 thread, I don't think this condition is needed
thats true, updated.


PS11, Line 248: pending_config.allow_unsafe() && config_to_commit.allow_unsafe()
> I think this should be removed due to the following scenario:
great catch, yeah.. but how can we let the unsafe config go through this 
equality check ? as of now, only the config without unsafe flag on both 
pending/committed states go through this check.  perhaps we can match their 
opid_index too to make sure they both belonged to one operation and not an 
overwritten case ?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 159: // Test unsafe config change when there is one follower survivor in 
the cluster.
> I will do this in an upcoming patch, I am still adding more tests and I may
Added this latest patch.


PS7, Line 193:  ASSERT_TRUE(has_leader) << SecureDebugString(tablet_locations);
             : 
             :   // Wait for initial NO_OP to be committed by the leader.
             :   TServerDetails* leader_ts;
             :   ASSERT_OK(FindTabletLeader(active_tablet_servers, tablet_id_, 
kTimeout, &leader_ts));
             :   ASSERT_OK(itest::WaitUntilCommittedOpIdIndexIs(1, leader_ts, 
tablet_id_, kTimeout));
             : 
             :   vector<TServerDetails*> tservers;
             :   AppendValuesFromMap(active_tablet_servers, &tservers);
             :   vector<TServerDetails*> followers;
             :   for (TServerDetails* ts : tservers) {
             :     if (ts->uuid() != leader_ts->uuid()) {
             :    
> This is duplicated multiple times in this file. Can you extract out some of
Done, added FindTabletFollowers() in itest_util


Line 224:     tablet_id_,
> The master orchestrates the part from #1 -> #2 -> #3, so if the master is d
Thats a nice idea; I need to dig some issue with catalog manager here before I 
add this code, because I was able to see #1, but re-replication isn't triggered 
due to :

W0320 15:01:36.682363 13406208 catalog_manager.cc:2624] Unable to reset TS 
proxy: Not found: Could not find TS for UUID 2eabfc88d76347a495dbcd33ad3b07b8

I think I know whats going on here, but need an additional test to fix this bug 
separately. In theory, catalog_manager seems to have a bug in transition from 
no_leader in the old config to a leader in new config.


Line 348:                                                    3, tablet_id_, 
kTimeout,
> You can force a leader to step down to be a follower, while it has a pendin
Yep, thats a good idea, I used the tool '/bin/kudu/ leader_step_down' to 
achieve the same. Added a test here.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS11, Line 665: fault
> I like the idea of using a fault to trigger committing config changes durin
Sure, just to be sure this has nothing todo with pending config right ? Can you 
also tell me what result can we verify with this test ? The RPC succeeds but 
new replicas never come up right ?


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

Line 95: Status ParseHostPortString(const std::string& hostport_str,
> Does this handle default ports?
No, but IIRC I believe not allowing default port was intentional. We are using 
an existing helper routine here. We want the user to provide a specific port 
string and not allow tool to interpret a default port by itself if user misses 
to specify port accidentally. Of course, the current API throws back saying its 
an invalid port string comarping with committed_config, but thats an additional 
check in the API tool shouldn't be aware of.

Tool uses this API to decipher passed "uuid:host:port" string. I am not sure 
what you meant by 'we are only removing peers' in above comment. We are 
currently adding the peers into new config.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 368:     HostPortPB new_peer_host_port_pb;
> I thought we said users don't have to pass the address of the server?
We have to, unless we have a way of figuring out address of the server by 
contacting the master. For eg, if user is writing {AB} on A, he needs to pass 
the HostPort endpoints of B. Contacting the master is one way, but we can defer 
that to later if we want to take that route (because master entry may not 
exist, could be stale etc)


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

Reply via email to