[ 
https://issues.apache.org/jira/browse/YARN-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345232#comment-16345232
 ] 

Jason Lowe commented on YARN-7843:
----------------------------------

Is this against 3.1.0?  I am guessing this is the relevant code that is 
crashing, but it would be good to verify:
{code:java}
    LocalizedResource rsrc = localrsrc.get(req);
    rsrc.setLocalPath(localPath);
{code}

If that is indeed the case then it looks like a resource was removed just as a 
path was being computed for localization.  I think I see some races where this 
could occur during cache cleanup or maybe even a case where a resource was 
thought to be localized and disappeared, but I don't see how this could happen 
for every container as implied in the description.

[~rohithsharma] have you checked the NM logs?  I'm curious if there are warning 
logs about the resource missing and being relocalized or other indications that 
the resource was removed from the cache just as another container was trying to 
use it.


> Container Localizer is failing with NPE
> ---------------------------------------
>
>                 Key: YARN-7843
>                 URL: https://issues.apache.org/jira/browse/YARN-7843
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Rohith Sharma K S
>            Priority: Blocker
>
> It is seen that container localizer are failing with NPE, as result none of 
> container are getting launched!
> {noformat}
> Caused by: java.lang.NullPointerException: java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1189)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1153)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:753)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:371)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to