Jason Lowe commented on YARN-4354:

I believe this was caused by YARN-2902.  A resource was just localized, but the 
resource is missing.  That normally doesn't occur.  However after YARN-2902 a 
resource can be yanked out while it is still downloading if a container 
releases it and the refcount is zero.  So if a public resource is requested by 
a container but killed before the localization completes then we can get a 
localized event for a missing resource and hit the NPE.

We should not be removing a resource if the localization will still complete, 
otherwise we not only risk the NPE but also leaking the local files.

> Public resource localization fails with NPE
> -------------------------------------------
>                 Key: YARN-4354
>                 URL: https://issues.apache.org/jira/browse/YARN-4354
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>            Priority: Blocker
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.

This message was sent by Atlassian JIRA

Reply via email to