[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129910#comment-14129910 ] Hudson commented on YARN-2459: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/677/]) YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 (xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60) * hadoop-yarn-project/CHANGES.txt RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: 2.6.0 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130032#comment-14130032 ] Hudson commented on YARN-2459: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/]) YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 (xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60) * hadoop-yarn-project/CHANGES.txt RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: 2.6.0 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130055#comment-14130055 ] Hudson commented on YARN-2459: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/]) YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 (xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60) * hadoop-yarn-project/CHANGES.txt RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: 2.6.0 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128927#comment-14128927 ] Xuan Gong commented on YARN-2459: - Committed into trunk and branch-2. Thanks, Jian. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128929#comment-14128929 ] Xuan Gong commented on YARN-2459: - Also, Thanks Mayank for the initial patch. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127201#comment-14127201 ] Xuan Gong commented on YARN-2459: - +1 LGTM RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125959#comment-14125959 ] Jian He commented on YARN-2459: --- bq. Add one in TestRMRestart to get an app rejected and make sure that the final-status gets recorded Added. bq. Another one in RMStateStoreTestBase to ensure it is okay to have an updateApp call without a storeApp call like in this case. Turns out RMStateStoreTestBase already has this test. {code} // test updating the state of an app/attempt whose initial state was not // saved. {code} RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126036#comment-14126036 ] Hadoop QA commented on YARN-2459: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667215/YARN-2459.4.patch against trunk revision df8c84c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4846//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4846//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126037#comment-14126037 ] Jian He commented on YARN-2459: --- New patch added some comments in the test case RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126135#comment-14126135 ] Hadoop QA commented on YARN-2459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667234/YARN-2459.5.patch against trunk revision d989ac0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4847//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4847//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126259#comment-14126259 ] Hadoop QA commented on YARN-2459: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667254/YARN-2459.6.patch against trunk revision d989ac0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4850//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4850//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115586#comment-14115586 ] Hadoop QA commented on YARN-2459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665124/YARN-2459.3.patch against trunk revision 4bd0194. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4771//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4771//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115851#comment-14115851 ] Vinod Kumar Vavilapalli commented on YARN-2459: --- Can we please add two more tests for future proofing this? - Add one in TestRMRestart to get an app rejected and make sure that the final-status gets recorded - Another one in RMStateStoreTestBase to ensure it is okay to have an updateApp call without a storeApp call like in this case. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114338#comment-14114338 ] Jian He commented on YARN-2459: --- Mayank, thanks for working on the issue. The current change saves the initial state, but doesn't store the final state and diagnostics of the app. And RM will retry this app if not saving the final state. I think we should do the following as New_saving state is handling it. {code} .addTransition(RMAppState.NEW, RMAppState.FINAL_SAVING, RMAppEventType.APP_REJECTED, new FinalSavingTransition(new AppRejectedTransition(), RMAppState.FAILED)) {code} RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114535#comment-14114535 ] Hadoop QA commented on YARN-2459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665124/YARN-2459.3.patch against trunk revision d9a7404. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4757//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4757//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114534#comment-14114534 ] Hadoop QA commented on YARN-2459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665099/YARN-2459-2.patch against trunk revision d1ae479. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4756//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4756//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113067#comment-14113067 ] Karthik Kambatla commented on YARN-2459: Stack Trace from Mayank: {noformat} 2014-08-24 18:43:04,603 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Skipping scheduling since node phxaishdc9dn0360.phx.ebay.com:58458 is reserved by applica tion appattempt_1408727267637_12984_01 2014-08-24 18:43:04,613 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1408727267637_12984 on node: ph xaishdc9dn0816.phx.ebay.com:50443 2014-08-24 18:43:04,613 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1408727267637_12984 reserved container container_1408727267637_1 2984_01_003215 on node host: phxaishdc9dn0816.phx.ebay.com:50443 #containers=17 available=4224 used=63360, currently has 310 at priority 10; currentReservation 2618880 2014-08-24 18:43:04,613 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Updated reserved container container_1408727267637_12984_01_003215 on node host: phxai shdc9dn0816.phx.ebay.com:50443 #containers=17 available=4224 used=63360 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp@2da03710 2014-08-24 18:43:04,613 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container application=application_1408727267637_12984 resource=memory:8448, vCores:1 queue=hdmi-set: capacity=0.2, absoluteCapacity=0.2, usedResources=memory:34293248, vCores:7092usedCapacity=1.4031365, absoluteUsedCapacity=0.28062728, numApps=12, numContainers=7092 usedCapacity=1.403 1365 absoluteUsedCapacity=0.28062728 used=memory:34293248, vCores:7092 cluster=memory:122202112, vCores:14584 2014-08-24 18:43:04,613 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Skipping scheduling since node phxaishdc9dn0816.phx.ebay.com:50443 is reserved by applica tion appattempt_1408727267637_12984_01 2014-08-24 18:43:04,614 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:852) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:849) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:948) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:967) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:849) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:642) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:181) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:167) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:832) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-08-24 18:43:04,647 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-08-24 18:43:04,732 INFO org.mortbay.log: Stopped sslsocketconnec...@apollo-phx-rm-1.vip.ebay.com:50030 2014-08-24
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113071#comment-14113071 ] Hadoop QA commented on YARN-2459: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664762/YARN-2459-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4746//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4746//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)