Jason Lowe commented on YARN-2902:

We still need to do this for APPLICATION resources.  It is true that those 
resources will be cleaned up when the application finishes, but that could be 
hours or days later.  And as for PUBLIC resources, Sangjin confirmed earlier 
he's seen the orphaning occur with those resources as well, so it must be 
occurring somehow even for those.  [~sjlee0] do you have any ideas on how 
PUBLIC resources ended up hung in a DOWNLOADING state?  I'm wondering if this 
is specific to the shared cache setup or if there's a code path we're missing.

I don't think we should special case the resource types to fix this.  Again I 
think the cleanest approach is to make sure we send an event to the 
LocalizedResource when a container localizer (or maybe just the container 
itself) is killed, and let that state machine handle it appropriately (e.g.: 
try to remove the _tmp file if the resource was in the downloading state, 
ignore it if it's already localized, etc.).

> Killing a container that is localizing can orphan resources in the 
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>             Fix For: 2.7.0
>         Attachments: YARN-2902.002.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.

This message was sent by Atlassian JIRA

Reply via email to