[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002937#comment-14002937 ] Hudson commented on YARN-2053: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5606/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003257#comment-14003257 ] Hudson commented on YARN-2053: -- FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/562/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003292#comment-14003292 ] Hudson commented on YARN-2053: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003469#comment-14003469 ] Hudson commented on YARN-2053: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998948#comment-13998948 ] Jian He commented on YARN-2053: --- LGTM, +1, submit the same patch to kick jenkins Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999533#comment-13999533 ] Hadoop QA commented on YARN-2053: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645047/YARN-2053.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3751//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3751//console This message is automatically generated. Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998324#comment-13998324 ] Wangda Tan commented on YARN-2053: -- Sure, I'll do that, thanks for review! Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997623#comment-13997623 ] Wangda Tan commented on YARN-2053: -- And I think this should be marked as critical or blocker bug, agree? Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997461#comment-13997461 ] Steve Loughran commented on YARN-2053: -- {code} {noformat} 14/05/10 17:02:17 INFO appmaster.SliderAppMaster: Connecting to RM at 48058,address tracking URL=http://c6403.ambari.apache.org:48705 14/05/10 17:02:17 ERROR main.ServiceLauncher: java.lang.NullPointerException at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344) at com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) at org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToProto(RegisterApplicationMasterResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.getProto(RegisterApplicationMasterResponsePBImpl.java:75) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:91) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Exception: java.lang.NullPointerException at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344) at com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) at org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToProto(RegisterApplicationMasterResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.getProto(RegisterApplicationMasterResponsePBImpl.java:75) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:91) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at
[jira] [Commented] (YARN-2053) Slider AM fails to restart
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997459#comment-13997459 ] Steve Loughran commented on YARN-2053: -- Note that this AM requests container retention over AM restarts -so is testing code paths that not much (anything?) else is testing Slider AM fails to restart -- Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {noformat} 14/05/10 17:02:17 INFO appmaster.SliderAppMaster: Connecting to RM at 48058,address tracking URL=http://c6403.ambari.apache.org:48705 14/05/10 17:02:17 ERROR main.ServiceLauncher: java.lang.NullPointerException at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344) at com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) at org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToProto(RegisterApplicationMasterResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.getProto(RegisterApplicationMasterResponsePBImpl.java:75) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:91) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Exception: java.lang.NullPointerException at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344) at com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) at org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123) at
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997621#comment-13997621 ] Wangda Tan commented on YARN-2053: -- Took a look at related code, I think this problem is caused by, In ApplicationMasterService.registerApplicationMaster(), it will add nmTokens from previous attempt's container via a loop. {code} ListContainer transferredContainers = ((AbstractYarnScheduler) rScheduler) .getTransferredContainers(applicationAttemptId); if (!transferredContainers.isEmpty()) { response.setContainersFromPreviousAttempts(transferredContainers); ListNMToken nmTokens = new ArrayListNMToken(); for (Container container : transferredContainers) { try { nmTokens.add(rmContext.getNMTokenSecretManager() .createAndGetNMToken(app.getUser(), applicationAttemptId, container);); } {code} And NMTokenSecretManager.createAndGetNMToken() {code} NMToken nmToken = null; if (nodeSet != null) { if (!nodeSet.contains(container.getNodeId())) { ... // set nmToken ... } } return nmToken {code} So if multiple container come from same NM (with same NodeId), null nmToken will be added to NMToken list. And in RegisterApplicationMasterResponsePBImpl.getTokenProtoIterable, it tried to convert a null NMToken to proto {code} @Override public NMTokenProto next() { return convertToProtoFormat(iter.next()); } {code} I think this should be the root cause of this problem, uploaded a patch. Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997804#comment-13997804 ] Jian He commented on YARN-2053: --- Thanks for catching this ! Patch looks good to me, Wangda, can you add a test case ? Basically, allocate 2 containers on the same node in TestAMRestart#testNMTokensRebindOnAMRestart should be enough. Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)