Jason Lowe commented on YARN-2902:

Forgot to respond to this comment:

bq. That is if NM recovery is not enabled and the deletion task is scheduled. 
But the deletion task is put in the deletion service's executor queue because 
all the 4 threads in deletion service's executor(NM delete threads) are 
occupied. If NM goes down before this task is taken up, the downloading 
resources wont be deleted.

If NM recovery is not enabled then failing to delete when the NM crashes is 
already a known issue.  As for the normal termination scenario we should be 
stopping the ResourceLocalizationService (via the ContainerManager shutdown) 
before trying to stop the DeletionService, so I would expect deletions to be 
queued up before we stop that service.

> Killing a container that is localizing can orphan resources in the 
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.

This message was sent by Atlassian JIRA

Reply via email to