[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI
honeyhexin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14048 ) Change subject: KUDU-2915 Support to delete dead tservers from CLI .. Patch Set 1: I agree that introducing `kudu cluster decommission` tool will be more reasonalbe. I will change some logic and resubmit the patch soon. -- To view, visit http://gerrit.cloudera.org:8080/14048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7 Gerrit-Change-Number: 14048 Gerrit-PatchSet: 1 Gerrit-Owner: honeyhexin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: honeyhexin Gerrit-Comment-Date: Mon, 12 Aug 2019 23:17:45 + Gerrit-HasComments: No
[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14048 ) Change subject: KUDU-2915 Support to delete dead tservers from CLI .. Patch Set 1: (2 comments) > Patch Set 1: > > Have we considered alternatives here? eg if the cluster is fully recovered > from an outage, maybe we shouldn't have KSCK show a downed tablet server as a > "bad" status? > > Another option that might be a bit better is to allow an option to forget a > _specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ > dead tablet servers. Seems I misunderstood the original intent of KUDU-2915. I thought it was the second option that Todd mentioned, i.e. a way to forget about a single dead tablet server. See here https://docs.google.com/document/d/12BZqspGjHvQlc-o8XTDixoRol9Q36WJzXLJ6p15Zhf0/edit#heading=h.f7wecdcqsbe for my thoughts on how I think that would be useful in the context of tserver decommissioning. I also chatted with Adar about this in person, and one thing worth considering is that this is basically the last step of a decommissioning, where the full steps for decommissioning would look something like: 1. Mark a tablet server as being decommissioning to avoid replica placement onto that tablet server. 2. Move all replicas away from the tablet server. 3. Once empty, (either automatically or with a tool) indicate that the tablet server has been successfully decommissioned by having the master forget about the decommissioned tablet server. If we want to build towards this without introducing redundant tooling, instead of having a `kudu master delete_tserver` tool and then eventually introducing `kudu cluster decommission` tool, we could introduce the "delete single tserver" functionality as `kudu cluster decommission` which would, for now, only do #3. #1 and #2 should be left as future work to build on top of this tool. What do you think? http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@638 PS1, Line 638: void MasterServiceImpl::DeleteDeadTServerInMaster(const DeleteDeadTabletServersRequestPB* req, The registration of a tablet server isn't persisted to disk. Is it important to take the leader lock when performing this action? http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@650 PS1, Line 650: void MasterServiceImpl::DeleteDeadTabletServers(const DeleteDeadTabletServersRequestPB* req, I don't see the advantage of having the master relay these RPCs to each other, versus having the tool instantiate multiple master proxies and sending the requests to each master individually. Does "forgetting" a tablet server _need_ to go to the leader first? -- To view, visit http://gerrit.cloudera.org:8080/14048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7 Gerrit-Change-Number: 14048 Gerrit-PatchSet: 1 Gerrit-Owner: honeyhexin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Aug 2019 18:53:29 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14048 ) Change subject: KUDU-2915 Support to delete dead tservers from CLI .. Patch Set 1: Have we considered alternatives here? eg if the cluster is fully recovered from an outage, maybe we shouldn't have KSCK show a downed tablet server as a "bad" status? Another option that might be a bit better is to allow an option to forget a _specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ dead tablet servers. -- To view, visit http://gerrit.cloudera.org:8080/14048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7 Gerrit-Change-Number: 14048 Gerrit-PatchSet: 1 Gerrit-Owner: honeyhexin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Aug 2019 17:34:46 + Gerrit-HasComments: No
[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI
honeyhexin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14048 Change subject: KUDU-2915 Support to delete dead tservers from CLI .. KUDU-2915 Support to delete dead tservers from CLI The patch provides a command that can delete all dead tablet servers from master. The command can be used as: kudu master delete_dead_tservers Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7 --- M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/master_proxy_rpc.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/master.cc M src/kudu/master/master.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/master/ts_manager.cc M src/kudu/master/ts_manager.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_master.cc 14 files changed, 177 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/14048/1 -- To view, visit http://gerrit.cloudera.org:8080/14048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7 Gerrit-Change-Number: 14048 Gerrit-PatchSet: 1 Gerrit-Owner: honeyhexin