Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14104 )
Change subject: IMPALA-8803: Coordinator should release admitted memory per-backend ...................................................................... Patch Set 2: (14 comments) Address Tim's and Michael's comments, still working through Bikram's. Some other notable changes: I originally thought all core tests passed, but turns out some are consistently failing. I fixed test_auto_scaling.py and test_executor_groups.py in this update, but am investigating a few more. The reason these tests started failing is that they assume a max number of queries can be run within an executor group and enforce this by setting -max_concurrent_queries. The issue is that they only set -max_concurrent_queries for executors, and set a much higher value for coordinators. That causes problems with this patch because backends might be released independently (e.g. executor backends might be released first, and then coordinator fragments). This breaks the assertions in https://github.com/apache/impala/blob/master/tests/custom_cluster/test_auto_scaling.py#L173). I added some more state validation into the AdmissionController. Essentially, I wanted to add a DCHECK to ensure ReleaseQueryBackends is called for all Backends before ReleaseQuery is called. This requires some extra internal book-keeping. http://gerrit.cloudera.org:8080/#/c/14104/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14104/1//COMMIT_MSG@24 PS1, Line 24: * Resources are released at most once every 1 second, this prevents > at most once Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@481 PS1, Line 481: nother value, such as 10, wo > Please add _MS suffix Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@505 PS1, Line 505: > nits: double-negative may not be very comprehensible. Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@527 PS1, Line 527: /// QuerySchedule associated with the Coordinator that owns this BackendResourceState. > What owns the object that is being referenced? The schedule? It's owned by the Coordinator, updated the comments with the ownership info. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@810 PS1, Line 810: Start the 'Timed Re > I feel like the name is a little confusing in isolation, I'm not sure peopl Changed it to NumCompletedBackends and moved it out of this class and into the Coordinator since it should really determine when a Backend is "Completed". http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@823 PS1, Line 823: > Coding convention for Impala recommends pre-increment Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.cc@865 PS1, Line 865: as > nit: as Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator.h@530 PS1, Line 530: /// to satisfy the above preconditions. If the query has an expensive finalization > nit: needs comment Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h@339 PS1, Line 339: Status SubmitForAdmission(const AdmissionRequest& request, > Needs a comment. We may also want to document the lifecycle of resource rel Done While writing up the class comments, it occurred to me that we want to ensure that ReleaseQueryBackends is called for all Backends in a query before ReleaseQuery is called. I added some book-keeping and DCHECKs to ensure this. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h@489 PS1, Line 489: : name_(name), parent_(parent), agg_num_running_(0), agg_num_queued_(0), > Needs a comment Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.h@816 PS1, Line 816: static bool CanAccommodateMaxInitialReservation(const QuerySchedule& schedule, > Needs a comment Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@370 PS1, Line 370: for (auto host_addr : host_addrs) { : auto backend_exec_params = schedule.per_backend_exec_params().find(host_addr); : if (backend_exec_params == schedule. > What happens in non-debug builds ? Should we log here instead and add a con Done http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/scheduling/admission-controller.cc@375 PS1, Line 375: PrintThrift(host_addr), PrintId(schedule.query_id())); > This pattern appears in a few places - might be worth making a helper funct Done http://gerrit.cloudera.org:8080/#/c/14104/1/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/14104/1/tests/query_test/test_result_spooling.py@82 PS1, Line 82: def get_workload(cls): > Let's move this into the custom_cluster directory. run-custom-cluster-tests Done -- To view, visit http://gerrit.cloudera.org:8080/14104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9 Gerrit-Change-Number: 14104 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 22 Aug 2019 19:10:02 +0000 Gerrit-HasComments: Yes
