[
https://issues.apache.org/jira/browse/YARN-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated YARN-11152:
----------------------------------
Labels: pull-request-available (was: )
> QueueMetrics is leaking memory when creating a new queue during
> reinitialisation
> --------------------------------------------------------------------------------
>
> Key: YARN-11152
> URL: https://issues.apache.org/jira/browse/YARN-11152
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: András Győri
> Assignee: András Győri
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Capacity Scheduler handles reinitialisation by reparsing the entire queue
> hierarchy, then reinitialising the old queue hierarchy by taking the newly
> parsed queues into account. After this, the newly parsed queues are discarded
> and they are GCed.
> However, with the introduction of YARN-6492, we are storing a parent queue in
> QueueMetrics, which is problematic, because at that point, the parent queue
> could still point to a parent reference, that is a newly parsed parent queue
> (which should be discarded after the reinitialisation). Due to this fact,
> QueueMetrics could contain parents members of an entirely different queue
> hierarchy than the current hierarchy in use. It could lead to subtle problems
> as well as memory leak, because one parent reference will keep the whole
> queue hierarchy alive.
> This problem arised when we programatically added one queue after an other
> via the mutation API, thus keeping alive hundreds of queue hierarchies at the
> same time, crippling the GC and the whole RM.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]