sandflee commented on YARN-3987:

Yes the old AM container in NM aren't cleaned up. in our case, AM crashed after 
it starts,  RM will create a new appAttempt and launch a new AM and will not 
expire,  it leaves the complete container in NM memory and NM stateStore. we 
set max-am-attempt to a very large num so the completed am container in NM 
For AM completed container, RM could send ack msg to NM, seems no need to wait 
for new AM to pull complete msg. and your idea? [~jianhe]

> am container complete msg ack to NM once RM receive it
> ------------------------------------------------------
>                 Key: YARN-3987
>                 URL: https://issues.apache.org/jira/browse/YARN-3987
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: sandflee
>            Assignee: sandflee
>         Attachments: YARN-3987.001.patch, YARN-3987.002.patch
> In our cluster we set max-am-attempts to a very very large num, and 
> unfortunately our am crash after launched, leaving too many completed 
> container(AM container) in NM.  completed container is removed from NM and 
> NMStateStore only if container complete is passed to AM, but if AM couldn't 
> be launched, the completed AM container couldn't be cleaned, and may eat up  
> NM heap memory.

This message was sent by Atlassian JIRA

Reply via email to