Varun Saxena commented on YARN-2902:

Thanks for the review [~jlowe].

bq. I think it would be better for the executor to let us know when a localizer 
has completed rather than assuming 1 second will be enough time (or too much 
time). We can tackle this in a followup JIRA since it's a more significant 
change, as I'm not sure executors are tracking localizers today.
We do not track localizers from executors. But issue is how do we track them ? 
Get PID of the localizer process and check if localizer has died ? But here the 
issue can be what if in between checks, localizer dies and PID is taken by some 
other process.
We primarily want localizer to die so that it doesn't download anything after 
we do the deletion.
One option would be to add a status in heartbeat asking localizer to 
cleanup(stop its downloading threads) and once that is done, indicate NM to do 
the deletion in another heartbeat. On this HB, NM can do the deletion and 
Localizer on HB response can DIE. Thoughts ? 

> Killing a container that is localizing can orphan resources in the 
> ------------------------------------------------------------------------------------
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.patch
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.

This message was sent by Atlassian JIRA

Reply via email to