Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet ......................................................................
Patch Set 1: (17 comments) Please also update kudu-tool-test's help tests with the new action. http://gerrit.cloudera.org:8080/#/c/4533/1//COMMIT_MSG Commit Message: PS1, Line 12: Nit: got an extra space here. PS1, Line 18: tablet You mean 'table' here, right? http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS1, Line 178: Status s = itest::GetConsensusState(ts, : tablet_id, : consensus::CONSENSUS_CONFIG_COMMITTED, : MonoDelta::FromSeconds(10), &cstate); : if (!s.ok()) { : return s; : } Just use RETURN_NOT_OK(...). Line 198: int num_retries = 0; Why is this defined way up here instead of closer to where it's used? PS1, Line 204: CHECK_EQ Use ASSERT_* instead of CHECK_* since all of these are on the main test thread. Line 218: CHECK_EQ(master_reported_leader, current_leader); The leader can change between the GetInfoFromConsensus calls and L213, right? If so, isn't this unsafe? PS1, Line 230: // Wait for re-election to happen again. : // 3 seconds picked to accommodate consensus heartbeat timeout(1.5s) Use AssertEventually() to make this more robust, and to minimize the amount of sleeping actually needed. PS1, Line 241: CHECK_OK(GetTabletLeaderUUIDFromMaster(tablet_id_, &master_reported_leader)); : CHECK_EQ(master_reported_leader, new_leader); Again, the leader can change whenever, so this doesn't seem safe. Line 264: CHECK_EQ(master_reported_leader, ""); Let's not check this. It's a common calling convention that if a function fails, it doesn't set any OUT parameters, and we don't need to verify that ourselves. PS1, Line 271: CHECK_EQ(current_leader, ""); : CHECK_EQ(current_term, 0); Likewise here, which means you needn't initialize current_term. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS1, Line 194: Status s = GetTabletLeader(client, tablet_id, &leader_uuid, &leader_hp); Consider consolidating from here to ResolveAddresses in a helper function, used from ChangeConfig too. It'd need a toggle to specify whether NotFound is fatal or not. PS1, Line 198: else if (!s.ok()) { : return s; : } To pile on with Tidy Bot, just do RETURN_NOT_OK(s) after the not found check. PS1, Line 209: ServerStatusPB status; : RETURN_NOT_OK(GetServerStatus(leader_hp.host(), leader_hp.port(), &status)); What's this for? I don't see status used anywhere. PS1, Line 224: return Status::RemoteError(resp.ShortDebugString()); Why not return StatusFromPB(resp.error().status()) like ConfigChange? PS1, Line 274: "Request the current leader of the " : "tablet(if present) to step down" Nit: reformat as "Force the tablet's leader replica (if present) to step down". PS1, Line 278: "Comma-separated list of Kudu Master addresses where each address is " : "of form 'hostname:port'" I admit, this one was my bad, but can you consolidate all of these descriptions in a global string constant like for kMasterAddressesArg? PS1, Line 280: "Tablet Identifier" Likewise for this, it's copied extensively. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
