Jason Lowe commented on YARN-2902:

bq. Container cleanup is called which deletes the container related directory 
which is used for localization.

Cleaning up a container-related directory has no effect on cleaning up a 
distributed cache resource.  Resources are not localized to a 
container-specific directory.  They are either localized to a public directory 
(for PUBLIC resources), a user-specific directory (for PRIVATE resources), or 
an application-specific directory (for APPLICATION resources).  Deleting the 
container directory will only delete symlinks to the resources -- it will never 
delete the actual resources on disk.  Also note that stopping the 
LocalizerRunner thread doesn't cleanup the disk state of resources in progress, 
so there's some extra work needed there as well.

bq. Was it a PUBLIC or PRIVATE resource which was left in the DOWNLOADING state 

I've definitely seen PRIVATE resources in this state, although it's possible 
there could have been some PUBLIC ones as well.

> Killing a container that is localizing can orphan resources in the 
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>             Fix For: 2.7.0
>         Attachments: YARN-2902.002.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.

This message was sent by Atlassian JIRA

Reply via email to