Dinesh Bhat has posted comments on this change. Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority replicas ......................................................................
Patch Set 11: (22 comments) http://gerrit.cloudera.org:8080/#/c/6066/11//COMMIT_MSG Commit Message: Line 41: > A description of the test cases you have implemented would be nice. Done http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/consensus.proto File src/kudu/consensus/consensus.proto: Line 495: required RaftConfigPB new_config = 4; > On second thought, maybe we can defer that since right now we only support agreed, not changing for now. PS11, Line 499: otherwise 'new_configuration' is set > Update comment Done PS11, Line 522: Unsafe > nit: Can we change this to Force instead of Unsafe? I initially started with that naming convention, but changed it to Unsafe only because the ForceChangeConfig seemed bit vague to me since the actual ChangeConfig itself is a forced operation from a follower point of view. However, if we want to differentiate between automatic and manual, we could go with ManualChangeConfig perhaps ? http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/metadata.proto File src/kudu/consensus/metadata.proto: PS11, Line 93: allow_unsafe > How about forced_config_change? this can go along with the naming convention for API, since I was suggesting 'manual' earlier, perhaps manual_config_change ? http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS7, Line 1667: pid_index); > You are using safetime which is in the past. Don't we want the timestamp to Lemme poke David to be sure, I am not sure if it makes much difference. One of the point David had was perhaps we shouldn't let the API to be exercised too soon by the user. i.e., we can have a check somewhere which says consensus on this node was stuck for say more than 10 minutes at the least to allow room for automatic recovery. http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1592: // Check that passed replica uuids are part of the committed config on this node. > Note: You can use the helper function IsRaftConfigVoter() from quorum_util. Done, I didn't use that routine because we already can compare directly its member type here. Also I am explicitly not comparing with specific membership type, because thats something which can be enforced by the tool. This also means we could let the user specify membership type on the CLI. Line 1601: new_peer.last_known_addr().host() == committed_peer.last_known_addr().host() && > Do we expect last_known_addr to always be passed in the new config? I thoug yeah we do. If you can recall, there was a reason why we chose RaftConfigPB as part of RPC, reason being in the case of 5-replica original config {ABCDE} and 2 nodes are part of the enforced config {AB} being written on node A, we want to be able to provide both A and B's HostPort endpoints as part of config. The tool could query HostPort details from master, but tool wouldn't know whether master has a stale info or recent. So for now, pushing back on the user to provide the complete info instead of this API or the tool determining that info at run time. Line 1623: RaftConfigPB new_config = config; > How about just use a copy of the committed config, but with the uuids we wa That's a good idea, but given that we are supplying the entire config to the API this is not different than the current code wrt end result. Did you have any other concern in mind ? http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc File src/kudu/consensus/raft_consensus_state.cc: Line 238: // In the event of unsafe config change enforced from an external tool, > I don't understand that logic. If there are multiple "pending" configs, whi I will move this discussion to latest patch since this code/comments have changed a bit there, and I see that you have added sine comments there as well. http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus_state.cc File src/kudu/consensus/raft_consensus_state.cc: PS11, Line 212: Allowing config update > How about: Allowing forced config change Done PS11, Line 239: may not have a pending config > You mean that it may not match the currently-pending config, right? thats correct, thanks for catching, updated. PS11, Line 242: both > I don't think this will work in all cases; see below update comment here. PS11, Line 247: !config_to_commit.allow_unsafe() > As mentioned in the rev 7 thread, I don't think this condition is needed thats true, updated. PS11, Line 248: pending_config.allow_unsafe() && config_to_commit.allow_unsafe() > I think this should be removed due to the following scenario: great catch, yeah.. but how can we let the unsafe config go through this equality check ? as of now, only the config without unsafe flag on both pending/committed states go through this check. perhaps we can match their opid_index too to make sure they both belonged to one operation and not an overwritten case ? http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 159: // Test unsafe config change when there is one follower survivor in the cluster. > I will do this in an upcoming patch, I am still adding more tests and I may Added this latest patch. PS7, Line 193: ASSERT_TRUE(has_leader) << SecureDebugString(tablet_locations); : : // Wait for initial NO_OP to be committed by the leader. : TServerDetails* leader_ts; : ASSERT_OK(FindTabletLeader(active_tablet_servers, tablet_id_, kTimeout, &leader_ts)); : ASSERT_OK(itest::WaitUntilCommittedOpIdIndexIs(1, leader_ts, tablet_id_, kTimeout)); : : vector<TServerDetails*> tservers; : AppendValuesFromMap(active_tablet_servers, &tservers); : vector<TServerDetails*> followers; : for (TServerDetails* ts : tservers) { : if (ts->uuid() != leader_ts->uuid()) { : > This is duplicated multiple times in this file. Can you extract out some of Done, added FindTabletFollowers() in itest_util Line 224: tablet_id_, > The master orchestrates the part from #1 -> #2 -> #3, so if the master is d Thats a nice idea; I need to dig some issue with catalog manager here before I add this code, because I was able to see #1, but re-replication isn't triggered due to : W0320 15:01:36.682363 13406208 catalog_manager.cc:2624] Unable to reset TS proxy: Not found: Could not find TS for UUID 2eabfc88d76347a495dbcd33ad3b07b8 I think I know whats going on here, but need an additional test to fix this bug separately. In theory, catalog_manager seems to have a bug in transition from no_leader in the old config to a leader in new config. Line 348: 3, tablet_id_, kTimeout, > You can force a leader to step down to be a follower, while it has a pendin Yep, thats a good idea, I used the tool '/bin/kudu/ leader_step_down' to achieve the same. Added a test here. http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS11, Line 665: fault > I like the idea of using a fault to trigger committing config changes durin Sure, just to be sure this has nothing todo with pending config right ? Can you also tell me what result can we verify with this test ? The RPC succeeds but new replicas never come up right ? http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: Line 95: Status ParseHostPortString(const std::string& hostport_str, > Does this handle default ports? No, but IIRC I believe not allowing default port was intentional. We are using an existing helper routine here. We want the user to provide a specific port string and not allow tool to interpret a default port by itself if user misses to specify port accidentally. Of course, the current API throws back saying its an invalid port string comarping with committed_config, but thats an additional check in the API tool shouldn't be aware of. Tool uses this API to decipher passed "uuid:host:port" string. I am not sure what you meant by 'we are only removing peers' in above comment. We are currently adding the peers into new config. http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: Line 368: HostPortPB new_peer_host_port_pb; > I thought we said users don't have to pass the address of the server? We have to, unless we have a way of figuring out address of the server by contacting the master. For eg, if user is writing {AB} on A, he needs to pass the HostPort endpoints of B. Contacting the master is one way, but we can defer that to later if we want to take that route (because master entry may not exist, could be stale etc) -- To view, visit http://gerrit.cloudera.org:8080/6066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
