Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-1575: Yield admission control resources at query end ......................................................................
IMPALA-1575: Yield admission control resources at query end Currently, a query does not release admission control resources until the client calls UnregisterQuery. Slow clients can hold admission control resources even after the query has reached a state where it will no longer return any rows. Specifically, in the following cases, the query is completed, but the client must still call UnregisterQuery to release admission control resources: 1. The query encounters an error and fails 2. The query is cancelled due to the idle query timeout 3. The query reaches eos (or the DML completes) 4. The client cancels the query without closing the query This change releases admission control resources as soon as the query reaches a state where it cannot return any rows rather than waiting for the client to explicitly end the query. When cancelling a query, the coordinator asynchronously notifies all fragment instances to cancel. The coordinator does not wait for the fragment instances to respond, so the cancel case can release admission control resources while some fragment instances may continue to run until the cancel takes effect. The concern with this behavior is that the fragment instances may continue to use memory and cause subsequent admitted queries to fail. This is already possible today, as a client can directly close a running query (which cancels the query and unregisters the query immediately). For example, the session idle timeout does this. However, this change expands the circumstances where this can happen. Admission control based on mem_limit operates differently. It relies on the reported memory usage of each QueryState to generate a cumulative memory usage across all of the instances. Admission control's behavior is determined by when the QueryState releases its memory. The existing behavior releases the query's memory on the destruction of the QueryState, which occurs when the query is unregistered. This matches the existing behavior for admission control prior to this change. To support the new behavior for mem_limit, the QueryState will now release query resources when the last fragment instance terminates. This unregisters the query memory tracker, which results in the admission control memory resources being freed. To test both aspects of this change, the admission control test (custom_cluster/test_admission_controller.py) has been modified to use four different modes of ending a query: client cancelling a query, the query hitting an idle timeout, the query reaching eos, and the client closing the query. The test uses a mix of all four. After the query ends, all clients wait for the test to complete before closing the query or closing the connection. This ensures that the admission control decisions are based entirely on the query end behavior. This test works for both query admission control and mem_limit admission control. Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/service/client-request-state.cc M tests/custom_cluster/test_admission_controller.py 7 files changed, 158 insertions(+), 102 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7079/2 -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>