Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16412 )
Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service ...................................................................... Patch Set 5: (22 comments) still going through it all, but my comments so far. http://gerrit.cloudera.org:8080/#/c/16412/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16412/5//COMMIT_MSG@12 PS5, Line 12: This patch adds some simple configuration flags that make it possible not sure I understand the plan here. this patch adds flags is --is_admission_controller to exec-env, which if set will expose the AdmissionControllerService. what about this will change exactly in IMPALA-9975? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h File be/src/scheduling/admission-control-service.h: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@76 PS5, Line 76: AdmitQueryRequstPB nit: typo http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@78 PS5, Line 78: UniqueIdPB query_id; : UniqueIdPB coord_id; can these both be const ref? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@105 PS5, Line 105: RuntimeProfile* summary_profile; might be worth mentioning that this is passed to AdmissionController::SubmitForAdmission, which is responsible for updating it. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@118 PS5, Line 118: fo nit: typo http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc File be/src/scheduling/admission-control-service.cc: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@85 PS5, Line 85: Adqmission nit: typo http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@142 PS5, Line 142: VLOG(1) << "GetQueryStatus " << req->query_id(); might be too verbose to log. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@155 PS5, Line 155: 100 why wait at all? won't waiting tie up one of the RPC threads? I think the client already waits 100 ms. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@159 PS5, Line 159: DCHECK(query_info->admit_status.ok()); is this expected? do want to DCHECK if the admit_status is not Status::OK()? if AC rejects the query due to cancellation, won't it returned a Status::CANCELLED? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@197 PS5, Line 197: lock_guard<mutex> l(query_info->lock); why does the lock need to be acquired here? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@209 PS5, Line 209: VLOG(1) << "ReleaseQueryBackends: query_id=" << req->query_id(); might be too verbose to log http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@219 PS5, Line 219: lock_guard<mutex> l(query_info->lock); why does the lock need to be acquired here? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@249 PS5, Line 249: {query_info->query_id, query_info->coord_id, query_info->query_exec_request, : query_info->query_options, query_info->summary_profile, : query_info->blacklisted_executor_addresses}, might make the code easier to read if this just created a AdmissionRequest and then passed it in. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-controller.h@316 PS5, Line 316: (usually they : /// are owned by the ClientRequestState). might need to revise this now. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-controller.h@345 PS5, Line 345: Status WaitOnQueued(const UniqueIdPB& query_id, not clear to me what the returned Status is suppose to indicate. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.h File be/src/scheduling/remote-admission-control-client.h: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.h@53 PS5, Line 53: std::mutex lock_; what is the lock used to protect? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc File be/src/scheduling/remote-admission-control-client.cc: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc@63 PS5, Line 63: KrpcSerializer serializer; : int sidecar_idx1; : RETURN_IF_ERROR( : serializer.SerializeToSidecar(&request.request, &rpc_controller, &sidecar_idx1)); : req.set_query_exec_request_sidecar_idx(sidecar_idx1); : : KrpcSerializer serializer2; : int sidecar_idx2; : RETURN_IF_ERROR(serializer2.SerializeToSidecar( : &request.query_options, &rpc_controller, &sidecar_idx2)); : req.set_query_options_sidecar_idx(sidecar_idx2); isn't TQueryOptions accessible from a TQueryExecRequest? TQueryExecRequest --> ImpalaInternalService.TQueryCtx --> TClientRequest --> TQueryOptions http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc@86 PS5, Line 86: KUDU_RETURN_IF_ERROR( : proxy->AdmitQuery(req, &resp, &rpc_controller), "AdmitQuery rpc failed"); : Status admit_status(resp.status()); : RETURN_IF_ERROR(admit_status); is it necessary to hold the lock while making the RPC to Admit the query? that seems like it could limit throughput since only one query can be submitted at a time. http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/service/impala-http-handler.cc@1057 PS5, Line 1057: void ImpalaHttpHandler::AdmissionStateHandler( perhaps this will done in another patch, but will the /admission endpoint on the coordinator be moved from the coordinator web ui to the admission-controller-service web ui? http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/util/sharded-query-map-util.cc File be/src/util/sharded-query-map-util.cc: http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/util/sharded-query-map-util.cc@21 PS5, Line 21: #include "scheduling/admission-control-service.h" seems odd that this needs to be imported here. can the templates be moved into the admission-control-service? same applies to the query-driver http://gerrit.cloudera.org:8080/#/c/16412/5/common/protobuf/admission_control_service.proto File common/protobuf/admission_control_service.proto: http://gerrit.cloudera.org:8080/#/c/16412/5/common/protobuf/admission_control_service.proto@162 PS5, Line 162: TQueryExwcRequest nit: typo http://gerrit.cloudera.org:8080/#/c/16412/5/common/protobuf/admission_control_service.proto@226 PS5, Line 226: service AdmissionControlService { it might be nice to have Tim and/or Bikram review the service definition, since its a key part of the change. -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Wed, 30 Sep 2020 19:33:18 +0000 Gerrit-HasComments: Yes
