[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129910#comment-14129910
 ] 

Hudson commented on YARN-2459:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. 
Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 
(xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60)
* hadoop-yarn-project/CHANGES.txt


 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.6.0

 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130032#comment-14130032
 ] 

Hudson commented on YARN-2459:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. 
Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 
(xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60)
* hadoop-yarn-project/CHANGES.txt


 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.6.0

 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130055#comment-14130055
 ] 

Hudson commented on YARN-2459:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. 
Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 
(xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60)
* hadoop-yarn-project/CHANGES.txt


 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.6.0

 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128927#comment-14128927
 ] 

Xuan Gong commented on YARN-2459:
-

Committed into trunk and branch-2. Thanks, Jian.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128929#comment-14128929
 ] 

Xuan Gong commented on YARN-2459:
-

Also, Thanks Mayank for the initial patch.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-09 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127201#comment-14127201
 ] 

Xuan Gong commented on YARN-2459:
-

+1 LGTM

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125959#comment-14125959
 ] 

Jian He commented on YARN-2459:
---

bq. Add one in TestRMRestart to get an app rejected and make sure that the 
final-status gets recorded
Added.
bq. Another one in RMStateStoreTestBase to ensure it is okay to have an 
updateApp call without a storeApp call like in this case.
Turns out RMStateStoreTestBase already has this test.
{code}
// test updating the state of an app/attempt whose initial state was not
// saved.
{code}

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126036#comment-14126036
 ] 

Hadoop QA commented on YARN-2459:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667215/YARN-2459.4.patch
  against trunk revision df8c84c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4846//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4846//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126037#comment-14126037
 ] 

Jian He commented on YARN-2459:
---

New patch added some comments in the test case

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126135#comment-14126135
 ] 

Hadoop QA commented on YARN-2459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667234/YARN-2459.5.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4847//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4847//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126259#comment-14126259
 ] 

Hadoop QA commented on YARN-2459:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667254/YARN-2459.6.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4850//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4850//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115586#comment-14115586
 ] 

Hadoop QA commented on YARN-2459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665124/YARN-2459.3.patch
  against trunk revision 4bd0194.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4771//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4771//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115851#comment-14115851
 ] 

Vinod Kumar Vavilapalli commented on YARN-2459:
---

Can we please add two more tests for future proofing this?
 - Add one in TestRMRestart to get an app rejected and make sure that the 
final-status gets recorded
 - Another one in RMStateStoreTestBase to ensure it is okay to have an 
updateApp call without a storeApp call like in this case.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114338#comment-14114338
 ] 

Jian He commented on YARN-2459:
---

 Mayank, thanks for working on the issue.  The current change saves the initial 
state, but doesn't store the final state and diagnostics of the app. And RM 
will retry this app if not saving the final state.  I think we should do the 
following as New_saving state is handling it.
{code}
.addTransition(RMAppState.NEW, RMAppState.FINAL_SAVING,
   RMAppEventType.APP_REJECTED,
   new FinalSavingTransition(new AppRejectedTransition(),
   RMAppState.FAILED))
{code}

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114535#comment-14114535
 ] 

Hadoop QA commented on YARN-2459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665124/YARN-2459.3.patch
  against trunk revision d9a7404.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4757//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4757//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114534#comment-14114534
 ] 

Hadoop QA commented on YARN-2459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665099/YARN-2459-2.patch
  against trunk revision d1ae479.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4756//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4756//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113067#comment-14113067
 ] 

Karthik Kambatla commented on YARN-2459:


Stack Trace from Mayank: 
{noformat}
2014-08-24 18:43:04,603 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Skipping scheduling since node phxaishdc9dn0360.phx.ebay.com:58458 is reserved 
by applica 
tion appattempt_1408727267637_12984_01 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Trying to fulfill reservation for application application_1408727267637_12984 
on node: ph 
xaishdc9dn0816.phx.ebay.com:50443 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
 Application application_1408727267637_12984 reserved container 
container_1408727267637_1 
2984_01_003215 on node host: phxaishdc9dn0816.phx.ebay.com:50443 #containers=17 
available=4224 used=63360, currently has 310 at priority 10; currentReservation 
2618880 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
 Updated reserved container container_1408727267637_12984_01_003215 on node 
host: phxai 
shdc9dn0816.phx.ebay.com:50443 #containers=17 available=4224 used=63360 for 
application 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp@2da03710
 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Reserved container application=application_1408727267637_12984 
resource=memory:8448, vCores:1 
queue=hdmi-set: capacity=0.2, absoluteCapacity=0.2, 
usedResources=memory:34293248, vCores:7092usedCapacity=1.4031365, 
absoluteUsedCapacity=0.28062728, numApps=12, numContainers=7092 
usedCapacity=1.403 
1365 absoluteUsedCapacity=0.28062728 used=memory:34293248, vCores:7092 
cluster=memory:122202112, vCores:14584 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Skipping scheduling since node phxaishdc9dn0816.phx.ebay.com:50443 is reserved 
by applica 
tion appattempt_1408727267637_12984_01 
2014-08-24 18:43:04,614 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) 
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) 
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:852)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:849)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:948)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:967)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:849)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:642)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:181)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:167)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:766)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:837)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:832)
 
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) 
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
at java.lang.Thread.run(Thread.java:745) 

2014-08-24 18:43:04,647 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1 
2014-08-24 18:43:04,732 INFO org.mortbay.log: Stopped 
sslsocketconnec...@apollo-phx-rm-1.vip.ebay.com:50030 
2014-08-24 

[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113071#comment-14113071
 ] 

Hadoop QA commented on YARN-2459:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664762/YARN-2459-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4746//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4746//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)