Jeongin Ju created YARN-10895:
---------------------------------

             Summary: ContainerIdPBImpl objects still can be leaked in 
RMNodeImpl.completedContainers
                 Key: YARN-10895
                 URL: https://issues.apache.org/jira/browse/YARN-10895
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 3.1.2
            Reporter: Jeongin Ju


YARN-10467 fixed ContainerIdPBImpl Object Leakage in 
RMNodeImpl.completedContainers.

After applying YARN-10467 patch and operating cluster with large number of 
nodes, we found similar heap leakage still exists.

In heap dump which are dumped after failover, (so it is not active RM) about 
4.5G is used by ContainerIDPBImpl on RMNodeImpl.completedContainers.

 

There are two cases.

 

1. Apps with 'KeepContainersAcrossApplicationAttempts' 

Even though 'KeepContainersAcrossApplicationAttempts' is set, we should clear 
RMAppAttemptImpl.justFinishedContainers.

If app attempt is failed and retried by next attempt, we may not need to clear 
RMAppAttemptImpl.justFinishedContainers because related ContainerIDPBImpl will 
be handed over to next attempts and eventually cleared.

However, when app is failed, there is no next attempt and heap leakage occur.

(We found this case when Yarn Service Application failed over multiple attempts 
because of OOM in AM)

 

2. Apps is killed explicitly by user

When app is killed by user by 'yarn application -kill' CLI interface or WebUI 
interface,  RMAppAttemptImpl.amContainerFinished is not called because app and 
app attempt state is already changed.

 

To handle this, we added sendFinishedContainersToNMs for each 
RMAppAttemptImpl.finishedContainersSentToAm, 
RMAppAttemptImpl.justFinishedContainers when Attempt is set to 'KILLED'

 

We found and patched our cluster on 3.1.2 but it seems trunk still has the same 
problem.

I attached patch based on the trunk.

 

Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to