Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18124 )
Change subject: KUDU-2915: add tool to remove a tablet server ...................................................................... Patch Set 3: (12 comments) http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master.proto@1076 PS2, Line 1076: optional string uuid = 1; > nit: we try to err away from using "required" in protobuf, given it's remov Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master.proto@1077 PS2, Line 1077: : // Whether or not to remove any persisted ts > nit: add a comment describing these args. Here's a start Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc@339 PS2, Line 339: remove_tserver_sta > nit: maybe rename this to 'remove_tserver_state' so we don't have to keep d Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc@344 PS2, Line 344: if (!l.CheckIsInitializedOrRespond(resp, rpc)) { : return; : } : : Status s = server_->ts_manager()->RemoveTServer(ts_uuid, force_remove_live_tserver); : if (PREDICT_FALSE(!s.ok() && !s.IsNotFound())) { : // Ignore the NotFound error to make this RPC retriable and effectively idempotent. : rpc->RespondFailure(s); : return; : } : : if (remove_tserver_state && l.leader_status() > nit: may be easier to follow as: Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc@358 PS2, Line 358: ChangeTServerSta > I'm curious what the rationale is behind not allowing a missing tserver. It The original idea is we should not operate an unknown tserver, now I think here we must allow missing tserver because it has already been unregistered. But it seems this flag doesn't affect anything when set NONE for a tserver. http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/ts_manager.cc File src/kudu/master/ts_manager.cc: http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/ts_manager.cc@326 PS2, Line 326: TServer $0 is > nit: let's try keeping usage of "tserver" vs "Tablet Server" vs "tablet ser Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7472 PS2, Line 7472: : INSTANTIATE_TEST_SUITE_P(, RemoveTServerTest, ::testing::Bool()); : > The --heartbeat_interval_ms default value is also 1s. Let's set it a bit hi Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7480 PS2, Line 7480: NO_FATALS(StartCluster()); > nit: "shut it down" Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7483 PS2, Line 7483: const string ts_uuid = ts->uuid(); : const string ts_hostport = ts->bound_rpc_addr().ToString(); : : // Enter maintenance mode on the tserver and shut it down. : ASSERT_OK(RunKuduTool({"tserver", "state", "enter_maintenance", master_addrs_str, ts_uuid})); : ts->Shutdown(); : : { : string out; : string err; : // Getting an error when running ksck and the output contains the dead tserver. : > nit: rather than sleeping and expecting the exact sleep to work, could we i Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7509 PS2, Line 7509: { : // Run ksck and get no error. : string out; : NO_FATALS(RunActionStdoutString(Substitute("cluster ksck $0", master_addrs_str), &out)); : if (remove_tserver_state) { : // Both the persisted state and registr > Maybe after this, could we restart the tserver with a very short heartbeat Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7508 PS2, Line 7508: Substitute("-remove_tserver_state=$0", remove_tserver_state)})); : { : // Run ksck and get no error. : string out; : NO_FATALS(RunActionStdoutString(Substitute("cluster ksck $0", master_addrs_str), &out)); : if (remove_tserver_state) { : // Both the persisted state and registration of the tserver was removed. : > In the else case, can we ASSERT_STR_NOT_CONTAINS("MAINTENANCE_MODE")? We sh Done http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7541 PS2, Line 7541: } > Could we also set a high tserver heartbeat interval, and then check afterwa Done -- To view, visit http://gerrit.cloudera.org:8080/18124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1f5c2979a8d14428f4bcc8e850c57ce228c793a Gerrit-Change-Number: 18124 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Comment-Date: Fri, 14 Jan 2022 11:38:44 +0000 Gerrit-HasComments: Yes
