[ 
https://issues.apache.org/jira/browse/YARN-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004480#comment-15004480
 ] 

Jason Lowe commented on YARN-4355:
----------------------------------

Stacktrace:
{noformat}
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1089)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1054)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:681)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:330)
        at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
        at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server.call(Server.java:2297)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:654)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:621)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1680)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2247)
{noformat}

The nodemanager was in the process of tearing down, so applications where being 
cleaned up.  Looks like localizer heartbeats can come in and we can lose the 
localizer tracker just as the localizer heartbeat tries to use it.

> NPE while processing localizer heartbeat
> ----------------------------------------
>
>                 Key: YARN-4355
>                 URL: https://issues.apache.org/jira/browse/YARN-4355
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>
> While analyzing YARN-4354 I noticed a nodemanager was getting NPEs while 
> processing a private localizer heartbeat.  I think there's a race where we 
> can cleanup resources for an application and therefore remove the app local 
> resource tracker just as we are trying to handle the localizer heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to