[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336691#comment-14336691
 ] 

Varun Saxena commented on YARN-2902:
------------------------------------

[~jlowe], looked into it. Was able to simulate the issue as well for PRIVATE 
resources.
I think we need to handle only for PRIVATE resources. APPLICATION resources 
will be cleaned up when application finishes. And PUBLIC resources should not 
remain orphaned as we do not kill or stop PublicLocalizer in between.

To download the resource, FSDownload appends a _tmp at the end of the directory 
to which resource will be downloaded to.
And while processing HB from Container Localizer, NM sends a destination path 
for the resource to be downloaded in response. 
We also download one resource at a time.

So, we can store this destination path in a queue in LocalizerRunner whenever 
we are sending a new path for download and remove it when fetch is successful. 
When container is killed (which causes LocalizerRunner to be cleaned up) we can 
fetch the path from the front of the queue and submit the associated temp path 
for deletion to DeletionService, if ref count for the resource is 0.

We cannot do this cleanup in ContainerLocalizer as LCE launches it as a new 
process and kills it when LocalizerRunner is interrupted.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>             Fix For: 2.7.0
>
>         Attachments: YARN-2902.002.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to