Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/23763 )
Change subject: IMPALA-14605: Fix memory leak in global admissiond for cancelled queued queries ...................................................................... Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admission-control-service.cc File be/src/scheduling/admission-control-service.cc: http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admission-control-service.cc@148 PS3, Line 148: > No need to take ownership of the mutex before calling notify_all: https:// I think there could be a small chance to bring some problems. If we don't have the lock here, when shutdown is set to true and notify all in Join(), in the loop, it may be already awake while the notify all from Join() and just decided to go to sleep without seeing shutdown flipped to true, then the loop goes back to sleep, and the Join() could then wait some time depends on the implementation of the wait() in the loop thread. Since the Join() only be called once during shutdown, it seems not bad to be conservative, holding the lock costs us nothing and can prevent edge cases http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admission-control-service.cc@438 PS3, Line 438: > Instead of passing "this" to a lambda, pass references to the two variables Changed to &. http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admissiond-env.cc File be/src/scheduling/admissiond-env.cc: http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admissiond-env.cc@139 PS3, Line 139: admission_controller_->RegisterAdmissionMapCleanupCallback( : [&](const UniqueIdPB& query_id) { : admission_control_svc_->CleanupAdmissionStateMapAsync(query_id); : }); > Is the cleanup thread only registered in case of dedicated AC service? Why This is a good question. The admission_state_map_ only exists in AdmissionControlService. In embedded mode, Coord works directly with the controller and does not use this map. So, embedded mode does not have this memory leak problem. Normally, we rely on the client calling functions like GetQueryStatus() to clean up the map. But in this specific case (cancellation due to errors while queued), if the client stops sending requests, GetQueryStatus() is never called, and no other functions are dealing with the cleanup. This entry is never removed. This callback ensures that when the AdmissionController drops a query as the last step (sometimes AdmissionControlService doesn't clean the admission state immediately just because it may be still used by AdmissionController), the service can explicitly cleans up its admission state http://gerrit.cloudera.org:8080/#/c/23763/3/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/23763/3/tests/custom_cluster/test_admission_controller.py@2426 PS3, Line 2426: impalad_args=("--debug_actions=INBOUND_G > This flag is not needed. self.enable_admission_service(method) in L2109 alr Done -- To view, visit http://gerrit.cloudera.org:8080/23763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f16f3cf986e2038d6486d3ec687135d561e2cbf Gerrit-Change-Number: 23763 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 17 Dec 2025 09:45:19 +0000 Gerrit-HasComments: Yes
