Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/18124 )
Change subject: KUDU-2915: add tool to remove a tablet server ...................................................................... Patch Set 3: Code-Review+1 (7 comments) http://gerrit.cloudera.org:8080/#/c/18124/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18124/3//COMMIT_MSG@14 PS3, Line 14: a(i nit: missing space http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/master/ts_manager.cc File src/kudu/master/ts_manager.cc: http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/master/ts_manager.cc@321 PS3, Line 321: if (!LookupTSByUUID(ts_uuid, &ts_desc)) { I was a bit concerned that there'd be a TOCTOU here, since we take the lock once here, but only erase under a different critical section down below. That said, I don't think there are incorrect interleavings, given the call to 'erase' is idempotent. http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/kudu-tool-test.cc@7561 PS3, Line 7561: } nit: after this, can we also verify that the ksck output still shows the tserver? http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/tool_action_tserver.cc File src/kudu/tools/tool_action_tserver.cc: http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/tool_action_tserver.cc@61 PS3, Line 61: force remove the tserver even if " : "it was considered alive by the master. nit: how about "If true, force the removal of the tserver, even if it is considered live by the master" http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/tool_action_tserver.cc@64 PS3, Line 64: "in-memory map but keep its persisted state(if any). When the same tserver re-register nit: "If the same tserver re-registers on the..." http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/tool_action_tserver.cc@64 PS3, Line 64: te( nit: add a space http://gerrit.cloudera.org:8080/#/c/18124/3/src/kudu/tools/tool_action_tserver.cc@323 PS3, Line 323: if (resp.has_error()) { : return StatusFromPB(resp.error().status()); : } Have you considered running the RPC on all masters, and then collecting errors at the end, maybe logging a warning for each failure? It might make it easier to reason about the behavior in the event that a master is down -- To view, visit http://gerrit.cloudera.org:8080/18124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1f5c2979a8d14428f4bcc8e850c57ce228c793a Gerrit-Change-Number: 18124 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Comment-Date: Tue, 18 Jan 2022 02:56:29 +0000 Gerrit-HasComments: Yes
