[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002937#comment-14002937 ] Hudson commented on YARN-2053: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5606/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003257#comment-14003257 ] Hudson commented on YARN-2053: -- FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/562/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003292#comment-14003292 ] Hudson commented on YARN-2053: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003469#comment-14003469 ] Hudson commented on YARN-2053: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/]) YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from previous attempts for work-preserving AM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998948#comment-13998948 ] Jian He commented on YARN-2053: --- LGTM, +1, submit the same patch to kick jenkins Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999533#comment-13999533 ] Hadoop QA commented on YARN-2053: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645047/YARN-2053.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3751//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3751//console This message is automatically generated. Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998324#comment-13998324 ] Wangda Tan commented on YARN-2053: -- Sure, I'll do that, thanks for review! Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997623#comment-13997623 ] Wangda Tan commented on YARN-2053: -- And I think this should be marked as critical or blocker bug, agree? Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997621#comment-13997621 ] Wangda Tan commented on YARN-2053: -- Took a look at related code, I think this problem is caused by, In ApplicationMasterService.registerApplicationMaster(), it will add nmTokens from previous attempt's container via a loop. {code} ListContainer transferredContainers = ((AbstractYarnScheduler) rScheduler) .getTransferredContainers(applicationAttemptId); if (!transferredContainers.isEmpty()) { response.setContainersFromPreviousAttempts(transferredContainers); ListNMToken nmTokens = new ArrayListNMToken(); for (Container container : transferredContainers) { try { nmTokens.add(rmContext.getNMTokenSecretManager() .createAndGetNMToken(app.getUser(), applicationAttemptId, container);); } {code} And NMTokenSecretManager.createAndGetNMToken() {code} NMToken nmToken = null; if (nodeSet != null) { if (!nodeSet.contains(container.getNodeId())) { ... // set nmToken ... } } return nmToken {code} So if multiple container come from same NM (with same NodeId), null nmToken will be added to NMToken list. And in RegisterApplicationMasterResponsePBImpl.getTokenProtoIterable, it tried to convert a null NMToken to proto {code} @Override public NMTokenProto next() { return convertToProtoFormat(iter.next()); } {code} I think this should be the root cause of this problem, uploaded a patch. Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
[ https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997804#comment-13997804 ] Jian He commented on YARN-2053: --- Thanks for catching this ! Patch looks good to me, Wangda, can you add a test case ? Basically, allocate 2 containers on the same node in TestAMRestart#testNMTokensRebindOnAMRestart should be enough. Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts Key: YARN-2053 URL: https://issues.apache.org/jira/browse/YARN-2053 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2053.patch, yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)