[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

2019-08-12 Thread honeyhexin (Code Review)
honeyhexin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
..


Patch Set 1:

I agree that introducing `kudu cluster decommission` tool will be more 
reasonalbe. I will change some logic and resubmit the patch soon.


--
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: honeyhexin 
Gerrit-Comment-Date: Mon, 12 Aug 2019 23:17:45 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

2019-08-12 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
..


Patch Set 1:

(2 comments)

> Patch Set 1:
>
> Have we considered alternatives here? eg if the cluster is fully recovered 
> from an outage, maybe we shouldn't have KSCK show a downed tablet server as a 
> "bad" status?
>
> Another option that might be a bit better is to allow an option to forget a 
> _specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ 
> dead tablet servers.

Seems I misunderstood the original intent of KUDU-2915. I thought it was the 
second option that Todd mentioned, i.e. a way to forget about a single dead 
tablet server. See here 
https://docs.google.com/document/d/12BZqspGjHvQlc-o8XTDixoRol9Q36WJzXLJ6p15Zhf0/edit#heading=h.f7wecdcqsbe
 for my thoughts on how I think that would be useful in the context of tserver 
decommissioning.

I also chatted with Adar about this in person, and one thing worth considering 
is that this is basically the last step of a decommissioning, where the full 
steps for decommissioning would look something like:

1. Mark a tablet server as being decommissioning to avoid replica placement 
onto that tablet server.
2. Move all replicas away from the tablet server.
3. Once empty, (either automatically or with a tool) indicate that the tablet 
server has been successfully decommissioned by having the master forget about 
the decommissioned tablet server.

If we want to build towards this without introducing redundant tooling, instead 
of having a `kudu master delete_tserver` tool and then eventually introducing 
`kudu cluster decommission` tool, we could introduce the "delete single 
tserver" functionality as `kudu cluster decommission` which would, for now, 
only do #3. #1 and #2 should be left as future work to build on top of this 
tool. What do you think?

http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@638
PS1, Line 638: void MasterServiceImpl::DeleteDeadTServerInMaster(const 
DeleteDeadTabletServersRequestPB* req,
The registration of a tablet server isn't persisted to disk. Is it important to 
take the leader lock when performing this action?


http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@650
PS1, Line 650: void MasterServiceImpl::DeleteDeadTabletServers(const 
DeleteDeadTabletServersRequestPB* req,
I don't see the advantage of having the master relay these RPCs to each other, 
versus having the tool instantiate multiple master proxies and sending the 
requests to each master individually. Does "forgetting" a tablet server _need_ 
to go to the leader first?



--
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Aug 2019 18:53:29 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

2019-08-12 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
..


Patch Set 1:

Have we considered alternatives here? eg if the cluster is fully recovered from 
an outage, maybe we shouldn't have KSCK show a downed tablet server as a "bad" 
status?

Another option that might be a bit better is to allow an option to forget a 
_specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ 
dead tablet servers.


--
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Aug 2019 17:34:46 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

2019-08-12 Thread honeyhexin (Code Review)
honeyhexin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14048


Change subject: KUDU-2915 Support to delete dead tservers from CLI
..

KUDU-2915 Support to delete dead tservers from CLI

The patch provides a command that can delete all dead
tablet servers from master. The command can be used as:
  kudu master delete_dead_tservers 

Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/master_proxy_rpc.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_master.cc
14 files changed, 177 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/14048/1
--
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin