András Győri created YARN-11152:
-----------------------------------

             Summary: QueueMetrics is leaking memory when creating a new queue 
during reinitialisation
                 Key: YARN-11152
                 URL: https://issues.apache.org/jira/browse/YARN-11152
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler
            Reporter: András Győri
            Assignee: András Győri


Capacity Scheduler handles reinitialisation by reparsing the entire queue 
hierarchy, then reinitialising the old queue hierarchy by taking the newly 
parsed queues into account. After this, the newly parsed queues are discarded 
and they are GCed.
However, with the introduction of YARN-6492, we are storing a parent queue in 
QueueMetrics, which is problematic, because at that point, the parent queue 
could still point to a parent reference, that is a newly parsed parent queue 
(which should be discarded after the reinitialisation). Due to this fact, 
QueueMetrics could contain parents members of an entirely different queue 
hierarchy than the current hierarchy in use. It could lead to subtle problems 
as well as memory leak, because one parent reference will keep the whole queue 
hierarchy alive.
This problem arised when we programatically added one queue after an other via 
the mutation API, thus keeping alive hundreds of queue hierarchies at the same 
time, crippling the GC and the whole RM.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to