[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/7079 ) Change subject: IMPALA-1575: Yield admission control resources at query end .. Abandoned Abandoning this patch since it's no longer up for review. I'll post updated patches. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-Change-Number: 7079 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7079/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 1086: void Coordinator::ReleaseResources() { I'm having trouble understanding whether this runs before or after QueryState::ReleaseResources(). It seems like we to release memory tracked by filter_mem_tracker_ before QueryState::ReleaseResources(), but then we need to release the admission control resources after QueryState::ReleaseResources() is called. There's a good chance I'm misunderstanding something but that's what I'm stuck on right now. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Joe McDonnell has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7079/3/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS3, Line 82: if (desc_tbl_ != nullptr) desc_tbl_->ReleaseResources(); This may need to be in ~QueryState(). I'm investigating how desc_tbl_ is used. So far, testing hasn't revealed any issue with having this here. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Joe McDonnell has uploaded a new patch set (#3). Change subject: IMPALA-1575: Yield admission control resources at query end .. IMPALA-1575: Yield admission control resources at query end Currently, a query does not release admission control resources until the client calls UnregisterQuery. Slow clients can hold admission control resources even after the query has reached a state where it will no longer return any rows. Specifically, in the following cases, the query is completed, but the client must still call UnregisterQuery to release admission control resources: 1. The query encounters an error and fails 2. The query is cancelled due to the idle query timeout 3. The query reaches eos (or the DML completes) 4. The client cancels the query without closing the query This change releases admission control resources as soon as the query reaches a state where it cannot return any rows rather than waiting for the client to explicitly end the query. When cancelling a query, the coordinator asynchronously notifies all fragment instances to cancel. The coordinator does not wait for the fragment instances to respond, so the cancel case can release admission control resources while some fragment instances may continue to run until the cancel takes effect. The concern with this behavior is that the fragment instances may continue to use memory and cause subsequent admitted queries to fail. This is already possible today, as a client can directly close a running query (which cancels the query and unregisters the query immediately). For example, the session idle timeout does this. However, this change expands the circumstances where this can happen. Admission control based on mem_limit operates differently. It relies on the reported memory usage of each QueryState to generate a cumulative memory usage across all of the instances. Admission control's behavior is determined by when the QueryState releases its memory. The existing behavior releases the query's memory on the destruction of the QueryState, which occurs when the query is unregistered. This matches the existing behavior for admission control prior to this change. To support the new behavior for mem_limit, the QueryState will now release query resources when the last fragment instance terminates. This unregisters the query memory tracker, which results in the admission control memory resources being freed. Backend tests do not spawn fragment instance threads and are not subject to the new semantics for releasing resources. Instead, the code detects this case and releases resources in the old location. To test both aspects of this change, the admission control test (custom_cluster/test_admission_controller.py) has been modified to use four different modes of ending a query: client cancelling a query, the query hitting an idle timeout, the query reaching eos, and the client closing the query. The test uses a mix of all four. After the query ends, all clients wait for the test to complete before closing the query or closing the connection. This ensures that the admission control decisions are based entirely on the query end behavior. This test works for both query admission control and mem_limit admission control. Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/service/client-request-state.cc M tests/custom_cluster/test_admission_controller.py 7 files changed, 170 insertions(+), 105 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7079/3 -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-1575: Yield admission control resources at query end .. IMPALA-1575: Yield admission control resources at query end Currently, a query does not release admission control resources until the client calls UnregisterQuery. Slow clients can hold admission control resources even after the query has reached a state where it will no longer return any rows. Specifically, in the following cases, the query is completed, but the client must still call UnregisterQuery to release admission control resources: 1. The query encounters an error and fails 2. The query is cancelled due to the idle query timeout 3. The query reaches eos (or the DML completes) 4. The client cancels the query without closing the query This change releases admission control resources as soon as the query reaches a state where it cannot return any rows rather than waiting for the client to explicitly end the query. When cancelling a query, the coordinator asynchronously notifies all fragment instances to cancel. The coordinator does not wait for the fragment instances to respond, so the cancel case can release admission control resources while some fragment instances may continue to run until the cancel takes effect. The concern with this behavior is that the fragment instances may continue to use memory and cause subsequent admitted queries to fail. This is already possible today, as a client can directly close a running query (which cancels the query and unregisters the query immediately). For example, the session idle timeout does this. However, this change expands the circumstances where this can happen. Admission control based on mem_limit operates differently. It relies on the reported memory usage of each QueryState to generate a cumulative memory usage across all of the instances. Admission control's behavior is determined by when the QueryState releases its memory. The existing behavior releases the query's memory on the destruction of the QueryState, which occurs when the query is unregistered. This matches the existing behavior for admission control prior to this change. To support the new behavior for mem_limit, the QueryState will now release query resources when the last fragment instance terminates. This unregisters the query memory tracker, which results in the admission control memory resources being freed. To test both aspects of this change, the admission control test (custom_cluster/test_admission_controller.py) has been modified to use four different modes of ending a query: client cancelling a query, the query hitting an idle timeout, the query reaching eos, and the client closing the query. The test uses a mix of all four. After the query ends, all clients wait for the test to complete before closing the query or closing the connection. This ensures that the admission control decisions are based entirely on the query end behavior. This test works for both query admission control and mem_limit admission control. Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/service/client-request-state.cc M tests/custom_cluster/test_admission_controller.py 7 files changed, 158 insertions(+), 102 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7079/2 -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Joe McDonnell has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 1: Testing shows that the current approach doesn't work for mem_limit based admission control. The reserved memory is not released at the new location, so TestAdmissionControllerStress::test_mem_limit fails. >From what I can tell, the reserved memory is released when >QueryState::ReleaseResources is called. The codepath that calls this is >ImpalaServer::UnregisterQuery-> ClientRequestState::~ClientRequestState -> >Coordinator::~Coordinator -> QueryExecManager::ReleaseQueryState. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7079/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 1114: } > Does Coordinator::ReleaseResources() really mean that all finstances' resou you could view the admission control "tokens" as a resource, ie, there's a finite amount of it and it's accounted for. but the comment is a bit misleading, since the admission controller doesn't allocate actual compute resources. ReleaseResources() doesn't have any control over the instance resources. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Dan Hecht has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7079/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 889: WaitForBackendCompletion()) note that on this path we are waiting for all backends to finish already. Line 1114: } Does Coordinator::ReleaseResources() really mean that all finstances' resources have been freed? My interpretation was that this method was only for freeing resources for the coordinator object itself, and so I don't see why it makes sense tie admission control to it. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 1: (7 comments) I think there were going to be some functional tests? http://gerrit.cloudera.org:8080/#/c/7079/1//COMMIT_MSG Commit Message: PS1, Line 12: query is finished query has transitioned to a state in which it will no longer return any more rows. PS1, Line 17: and an error PS1, Line 18: ended cancelled Line 19: 3. The query reaches eos (or the DML completes) Also: The query is cancelled by a client but not closed. (E.g. call HS2::CancelOperation() but not HS2::CloseOperation()) PS1, Line 22: complete avoid this word, e.g. use the wording I suggested above Line 24: call out the limitations, i.e. this doesn't mean all backends have completed terminated, and while this is already possible today, this can make this scenario more likely. Line 25: Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 comment on testing? I assume we still need to do the cluster testing. -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-1575: Yield admission control resources at query end .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7079/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 892: // release resources here, since we won't be fetching more result rows explain why this needs to happen after computequerysummary -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end
Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/7079 Change subject: IMPALA-1575: Yield admission control resources at query end .. IMPALA-1575: Yield admission control resources at query end Currently, a query does not release admission control resources until the client calls UnregisterQuery. Slow clients can hold admission control resources even after the query is finished. Specifically, in the following cases, the query is completed, but the client must still call UnregisterQuery to release admission control resources: 1. The query encounters and error and fails 2. The query is ended due to the idle query timeout 3. The query reaches eos (or the DML completes) This change releases admission control resources as soon as the query is complete rather than waiting for the client. Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/client-request-state.cc 3 files changed, 40 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7079/1 -- To view, visit http://gerrit.cloudera.org:8080/7079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell