Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )
Change subject: KUDU-2289: Tablet deletion should be throttled ...................................................................... Patch Set 3: (6 comments) Do you have any advice on how to test this? I'm not setting a queue length, the idea being that DeleteTablets can wait in the queue instead of making the master constantly retry them, so the DeleteTablet RPCs will no be throttled. I can reproduce symptoms of the original problem in CreateTableStressTest.CreateAndDeleteBigTable by turning up the number of tablets in the big table-- there'll be service-queue-full rejections of CreateTablet and DeleteTablet calls in the logs-- and those go away for DeleteTablets when this change is applied. DeleteTablet is an admin service RPC, so I can't, for example, Ping the service to see if the queue is full. http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc@807 PS3, Line 807: context, req, resp > I think they're guaranteed to live until the call to context->RespondSucces Thanks Adar. Here's a relevant snippet from the comment on RpcContext::RespondSuccess() in rpc_context.h: // After this method returns, this RpcContext object is destroyed. The request // and response protobufs are also destroyed. Likewise, the context, request, and response objects are also destroyed when calling the other RpcContext::Respond* methods. http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.h@142 PS3, Line 142: Status DeleteTablet(const std::string& tablet_id, > Could we make the synchronous version private? AFAICT it's now only used in DeleteTabletRunnable calls DeleteTablet. I could make DeleteTabletRunnable a friend of TSTabletManager and make DeleteTablet private, but that feels like overkill because DeleteTabletRunnable just needs access to one method: a synchronous DeleteTablet(). http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@278 PS3, Line 278: virtual void DisableCallback() { > This doesn't need to be virtual, does it? Done http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@321 PS3, Line 321: max_open_threads > This should be max_delete_threads. Done http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@840 PS3, Line 840: new DeleteTabletRunnable(this, tablet_id, delete_type, cas_config_index, cb)); > Nit: use std::make_shared to make the runnable. Done http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@848 PS3, Line 848: if (s.IsServiceUnavailable()) { : cb(s, TabletServerErrorPB::THROTTLED); : return; : } : cb(s, TabletServerErrorPB::UNKNOWN_ERROR); > Simpler as: Done -- To view, visit http://gerrit.cloudera.org:8080/9551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998 Gerrit-Change-Number: 9551 Gerrit-PatchSet: 3 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Mon, 19 Mar 2018 19:00:10 +0000 Gerrit-HasComments: Yes