[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742662#comment-14742662
 ] 

Varun Saxena commented on YARN-2902:
------------------------------------

[~jlowe], kindly review.

The patch at a very high level does the following :
# On container kill, NM(localization service) will create deletion task for all 
the downloading resources and schedule it to run after a configured delay(new 
config added for it). Made the decision to not wait for HB from localizer first 
because we would not want to depend on localizer if there is some problem there 
and it does not send HB.
# On subsequent HB from localizer, NM will indicate to localizer that it can 
delete the downloading resources by itself after cancelling download tasks. 
Added a boolean flag in proto for this.
# After localizer deletes the resources, it will be send last HB to NM. A 
boolean flag has been added in proto to indicate this to NM. On receiving this 
HB, NM will cancel the deletion task so that deletion is not attempted by NM as 
well. Although its not a problem even if we attempt deletion because if nothing 
can be deleted, deletion task wont do anything. But if deletion task can be 
cancelled, then why not.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to