[ https://issues.apache.org/jira/browse/YARN-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046804#comment-15046804 ]
Brahma Reddy Battula commented on YARN-4427: -------------------------------------------- [~rohithsharma] thanks a for taking a look into this issue..Yes, {{masterContainer}} is null,even I thought {{rmAppAttempt}} can be null,but it is not in this cluster. {code} RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptId); Container masterContainer = rmAppAttempt.getMasterContainer(); if ((masterContainer.getId().equals(containerStatus.getContainerId())) && (containerStatus.getContainerState() == ContainerState.COMPLETE)) {code} *Cause :* As I mentioned in the description,ZK Cluster was up and down which makes frequent leader election..Thinking RM written znode with ZK1 and while recovering reading from ZK2 where data is not synced(Here master container details missed). Please correct me if I am wrong.. > NPE on handleNMContainerStatus when NM is registering to RM > ----------------------------------------------------------- > > Key: YARN-4427 > URL: https://issues.apache.org/jira/browse/YARN-4427 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Brahma Reddy Battula > Assignee: Brahma Reddy Battula > Priority: Critical > > *Seen the following in one of our environment when AM got allocated > container but failed to updated in the ZK Where cluster is having network > problem for sometime(up and down).* > {noformat} > 2015-12-07 16:39:38,489 | WARN | IPC Server handler 49 on 26003 | IPC Server > handler 49 on 26003, call > org.apache.hadoop.yarn.server.api.ResourceTrackerPB.registerNodeManager from > 9.91.8.220:52169 Call#17 Retry#0 | Server.java:2107 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.handleNMContainerStatus(ResourceTrackerService.java:286) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:395) > at > org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54) > at > org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$2.callBlockingMethod(ResourceTracker.java:79) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082) > {noformat} > Corresponding code, it might not match with {{branch-2.7/Trunk}} since we had > modified internally. > {code} > 284 RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptId); > 285 Container masterContainer = rmAppAttempt.getMasterContainer(); > 286 if (masterContainer.getId().equals(containerStatus.getContainerId()) > 287 && containerStatus.getContainerState() == ContainerState.COMPLETE) > { > 288 ContainerStatus status = > 289 ContainerStatus.newInstance(containerStatus.getContainerId(), > 290 containerStatus.getContainerState(), > containerStatus.getDiagnostics(), > 291 containerStatus.getContainerExitStatus()); > 292 // sending master container finished event. > 293 RMAppAttemptContainerFinishedEvent evt = > 294 new RMAppAttemptContainerFinishedEvent(appAttemptId, status, > 295 nodeId); > 296 rmContext.getDispatcher().getEventHandler().handle(evt); > 297 } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)