Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23763 )

Change subject: IMPALA-14605: Fix memory leak in global admissiond for 
cancelled queued queries
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admission-control-service.cc@148
PS3, Line 148:
> No need to take ownership of the mutex before calling notify_all:  https://
I think there could be a small chance to bring some problems.
If we don't have the lock here, when shutdown is set to true and notify all in 
Join(), in the loop, it may be already awake while the notify all from Join() 
and just decided to go to sleep without seeing shutdown flipped to true, then 
the loop goes back to sleep, and the Join() could then wait some time depends 
on the implementation of the wait() in the loop thread. Since the Join() only 
be called once during shutdown, it seems not bad to be conservative, holding 
the lock costs us nothing and can prevent edge cases


http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admission-control-service.cc@438
PS3, Line 438:
> Instead of passing "this" to a lambda, pass references to the two variables
Changed to &.


http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admissiond-env.cc
File be/src/scheduling/admissiond-env.cc:

http://gerrit.cloudera.org:8080/#/c/23763/3/be/src/scheduling/admissiond-env.cc@139
PS3, Line 139:   admission_controller_->RegisterAdmissionMapCleanupCallback(
             :       [&](const UniqueIdPB& query_id) {
             :         
admission_control_svc_->CleanupAdmissionStateMapAsync(query_id);
             :       });
> Is the cleanup thread only registered in case of dedicated AC service? Why
This is a good question. The admission_state_map_ only exists in 
AdmissionControlService. In embedded mode, Coord works directly with the 
controller and does not use this map. So, embedded mode does not have this 
memory leak problem.
Normally, we rely on the client calling functions like GetQueryStatus() to 
clean up the map. But in this specific case (cancellation due to errors while 
queued), if the client stops sending requests, GetQueryStatus() is never 
called, and no other functions are dealing with the cleanup. This entry is 
never removed. This callback ensures that when the AdmissionController drops a 
query as the last step (sometimes AdmissionControlService doesn't clean the 
admission state immediately just because it may be still used by 
AdmissionController), the service can explicitly cleans up its  admission state


http://gerrit.cloudera.org:8080/#/c/23763/3/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/23763/3/tests/custom_cluster/test_admission_controller.py@2426
PS3, Line 2426: impalad_args=("--debug_actions=INBOUND_G
> This flag is not needed. self.enable_admission_service(method) in L2109 alr
Done



--
To view, visit http://gerrit.cloudera.org:8080/23763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f16f3cf986e2038d6486d3ec687135d561e2cbf
Gerrit-Change-Number: 23763
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 17 Dec 2025 09:45:19 +0000
Gerrit-HasComments: Yes

Reply via email to