Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
......................................................................


Patch Set 1:

(2 comments)

> Patch Set 1:
>
> Have we considered alternatives here? eg if the cluster is fully recovered 
> from an outage, maybe we shouldn't have KSCK show a downed tablet server as a 
> "bad" status?
>
> Another option that might be a bit better is to allow an option to forget a 
> _specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ 
> dead tablet servers.

Seems I misunderstood the original intent of KUDU-2915. I thought it was the 
second option that Todd mentioned, i.e. a way to forget about a single dead 
tablet server. See here 
https://docs.google.com/document/d/12BZqspGjHvQlc-o8XTDixoRol9Q36WJzXLJ6p15Zhf0/edit#heading=h.f7wecdcqsbe
 for my thoughts on how I think that would be useful in the context of tserver 
decommissioning.

I also chatted with Adar about this in person, and one thing worth considering 
is that this is basically the last step of a decommissioning, where the full 
steps for decommissioning would look something like:

1. Mark a tablet server as being decommissioning to avoid replica placement 
onto that tablet server.
2. Move all replicas away from the tablet server.
3. Once empty, (either automatically or with a tool) indicate that the tablet 
server has been successfully decommissioned by having the master forget about 
the decommissioned tablet server.

If we want to build towards this without introducing redundant tooling, instead 
of having a `kudu master delete_tserver` tool and then eventually introducing 
`kudu cluster decommission` tool, we could introduce the "delete single 
tserver" functionality as `kudu cluster decommission` which would, for now, 
only do #3. #1 and #2 should be left as future work to build on top of this 
tool. What do you think?

http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@638
PS1, Line 638: void MasterServiceImpl::DeleteDeadTServerInMaster(const 
DeleteDeadTabletServersRequestPB* req,
The registration of a tablet server isn't persisted to disk. Is it important to 
take the leader lock when performing this action?


http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@650
PS1, Line 650: void MasterServiceImpl::DeleteDeadTabletServers(const 
DeleteDeadTabletServersRequestPB* req,
I don't see the advantage of having the master relay these RPCs to each other, 
versus having the tool instantiate multiple master proxies and sending the 
requests to each master individually. Does "forgetting" a tablet server _need_ 
to go to the leader first?



--
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Mon, 12 Aug 2019 18:53:29 +0000
Gerrit-HasComments: Yes

Reply via email to