Jason Lowe commented on YARN-1801:

bq. Even if the assoc is null, should we close the threadpool?

Ideally no, we shouldn't shutdown the localizer if this somehow does occur.  
However we shouldn't simply ignore the problem either.  That's why I said this 
in my earlier comment:

Assuming this is still a potential issue, we should either find a way to 
prevent it from ever occurring or recover in a way that keeps the public 
localizer working as much as possible. It'd be great if we could just pull from 
the queue and receive a structure that has both the request event and the 
Future<Path> so we don't have to worry about a Future<Path> with no associated 
event. If we're going to try to recover instead, we'd have to log an error and 
try to cleanup. With no associated request event and no path if we got an 
execution error, it's going to be particularly difficult to recover properly.

> NPE in public localizer
> -----------------------
>                 Key: YARN-1801
>                 URL: https://issues.apache.org/jira/browse/YARN-1801
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Jason Lowe
>            Assignee: Hong Zhiguo
>            Priority: Critical
>         Attachments: YARN-1801.patch
> While investigating YARN-1800 found this in the NM logs that caused the 
> public localizer to shutdown:
> {noformat}
> 2014-01-23 01:26:38,655 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:addResource(651)) - Downloading public 
> rsrc:{ 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
>  1390440382009, FILE, null }
> 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(726)) - Error: Shutting down
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
> 2014-01-23 01:26:38,656 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(728)) - Public cache exiting
> {noformat}

This message was sent by Atlassian JIRA

Reply via email to