[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951016#comment-14951016
 ] 

Varun Saxena commented on YARN-2902:
------------------------------------

Just to let you know, one case where this wont work, I mean after removal of 
flag from protocol.

1. NM recovery is disabled.
2. Container is killed. Associated resources are stuck in downloading state and 
a deletion task is launched for them.
3. In the meantime localizer downloads a resource and on next HB, Localizer 
reports a downloaded resource to NM. In NM this will be in downloading state.
4. NM indicates localizer to DIE. Localizer wont delete the resource just 
downloaded.
5. NM crashes.
6. NM would missing deleting the downloading resource as well as recovery is 
disabled.

This I agree though should be a very rare scenario and we can skip it.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to