Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18124 )

Change subject: KUDU-2915: add tool to remove a tablet server
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master.proto@1076
PS2, Line 1076:   optional string uuid = 1;
> nit: we try to err away from using "required" in protobuf, given it's remov
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master.proto@1077
PS2, Line 1077:
              :   // Whether or not to remove any persisted ts
> nit: add a comment describing these args. Here's a start
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc@339
PS2, Line 339: remove_tserver_sta
> nit: maybe rename this to 'remove_tserver_state' so we don't have to keep d
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc@344
PS2, Line 344: if (!l.CheckIsInitializedOrRespond(resp, rpc)) {
             :     return;
             :   }
             :
             :   Status s = server_->ts_manager()->RemoveTServer(ts_uuid, 
force_remove_live_tserver);
             :   if (PREDICT_FALSE(!s.ok() && !s.IsNotFound())) {
             :     // Ignore the NotFound error to make this RPC retriable and 
effectively idempotent.
             :     rpc->RespondFailure(s);
             :     return;
             :   }
             :
             :   if (remove_tserver_state && l.leader_status()
> nit: may be easier to follow as:
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/master_service.cc@358
PS2, Line 358:           ChangeTServerSta
> I'm curious what the rationale is behind not allowing a missing tserver. It
The original idea is we should not operate an unknown tserver, now I think here 
we must allow missing tserver because it has already been unregistered. But it 
seems this flag doesn't affect anything when set NONE for a tserver.


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/ts_manager.cc
File src/kudu/master/ts_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/master/ts_manager.cc@326
PS2, Line 326: TServer $0 is
> nit: let's try keeping usage of "tserver" vs "Tablet Server" vs "tablet ser
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7472
PS2, Line 7472:
              : INSTANTIATE_TEST_SUITE_P(, RemoveTServerTest, 
::testing::Bool());
              :
> The --heartbeat_interval_ms default value is also 1s. Let's set it a bit hi
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7480
PS2, Line 7480:   NO_FATALS(StartCluster());
> nit: "shut it down"
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7483
PS2, Line 7483: const string ts_uuid = ts->uuid();
              :   const string ts_hostport = ts->bound_rpc_addr().ToString();
              :
              :   // Enter maintenance mode on the tserver and shut it down.
              :   ASSERT_OK(RunKuduTool({"tserver", "state", 
"enter_maintenance", master_addrs_str, ts_uuid}));
              :   ts->Shutdown();
              :
              :   {
              :     string out;
              :     string err;
              :     // Getting an error when running ksck and the output 
contains the dead tserver.
              :
> nit: rather than sleeping and expecting the exact sleep to work, could we i
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7509
PS2, Line 7509:   {
              :     // Run ksck and get no error.
              :     string out;
              :     NO_FATALS(RunActionStdoutString(Substitute("cluster ksck 
$0", master_addrs_str), &out));
              :     if (remove_tserver_state) {
              :       // Both the persisted state and registr
> Maybe after this, could we restart the tserver with a very short heartbeat
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7508
PS2, Line 7508:                         Substitute("-remove_tserver_state=$0", 
remove_tserver_state)}));
              :   {
              :     // Run ksck and get no error.
              :     string out;
              :     NO_FATALS(RunActionStdoutString(Substitute("cluster ksck 
$0", master_addrs_str), &out));
              :     if (remove_tserver_state) {
              :       // Both the persisted state and registration of the 
tserver was removed.
              :
> In the else case, can we ASSERT_STR_NOT_CONTAINS("MAINTENANCE_MODE")? We sh
Done


http://gerrit.cloudera.org:8080/#/c/18124/2/src/kudu/tools/kudu-tool-test.cc@7541
PS2, Line 7541: }
> Could we also set a high tserver heartbeat interval, and then check afterwa
Done



--
To view, visit http://gerrit.cloudera.org:8080/18124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1f5c2979a8d14428f4bcc8e850c57ce228c793a
Gerrit-Change-Number: 18124
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Comment-Date: Fri, 14 Jan 2022 11:38:44 +0000
Gerrit-HasComments: Yes

Reply via email to