[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256895#comment-14256895
 ] 

Hudson commented on YARN-2340:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #50 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/50/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256907#comment-14256907
 ] 

Hudson commented on YARN-2340:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #784 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/784/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257002#comment-14257002
 ] 

Hudson commented on YARN-2340:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #47 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/47/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257007#comment-14257007
 ] 

Hudson commented on YARN-2340:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1982 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1982/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257048#comment-14257048
 ] 

Hudson commented on YARN-2340:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #51 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/51/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257091#comment-14257091
 ] 

Hudson commented on YARN-2340:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2001 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2001/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429
 ] 

Rohith commented on YARN-2340:
--

[~jianhe] Kindly review the patch

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256362#comment-14256362
 ] 

Jian He commented on YARN-2340:
---

looks good, +1

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256584#comment-14256584
 ] 

Hadoop QA commented on YARN-2340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch
  against trunk revision fdf042d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6173//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6173//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6173//console

This message is automatically generated.

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256631#comment-14256631
 ] 

Rohith commented on YARN-2340:
--

There so many tests are failing randomly in trunk!!

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256637#comment-14256637
 ] 

Jian He commented on YARN-2340:
---

right, we should spend time fixing these..

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256641#comment-14256641
 ] 

Hudson commented on YARN-2340:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6777 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6777/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251451#comment-14251451
 ] 

Hadoop QA commented on YARN-2340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6144//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6144//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6144//console

This message is automatically generated.

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251455#comment-14251455
 ] 

Rohith commented on YARN-2340:
--

It looks failed tests is random. In my env, it is running successfully.

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246426#comment-14246426
 ] 

Rohith commented on YARN-2340:
--

Thanks [~nishan] for reporting this issue. I too encountered with similar 
situation while testing on trunk code and later RM remain in stand by.



 NPE thrown when RM restart after queue is STOPPED
 -

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical

 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246442#comment-14246442
 ] 

Rohith commented on YARN-2340:
--

Scenario executed
# Start Yarn cluster, and submit long running application to Queue to 
default.Initially, RM1 is active
# *Stop the queue default* in both RM1 and RM2 using -refreshQueue. Queue can 
be stopped even when application is running, but wont accept new application 
submissions.
# Switch the RM, let RM2 transitionedToActive. But here application recovery 
fails since queue already stopped. Below logs shows the failure, but *RMAppImpl 
state is updated as FAILED RMAppAttempt remain as null*. RM remain in standby
{noformat}
2014-12-15 11:01:17,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1418620667348_0001 with 1 attempts and final state = null
2014-12-15 11:01:17,814 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1418620667348_0001_01 with final state: null
/.
/
2014-12-15 11:01:17,824 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Queue root.default is STOPPED. Cannot accept submission of application: 
application_1418620667348_0001
2014-12-15 11:01:17,825 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to submit application application_1418620667348_0001 to queue default 
from user rohith
org.apache.hadoop.security.AccessControlException: Queue root.default is 
STOPPED. Cannot accept submission of application: application_1418620667348_0001
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.submitApplication(LeafQueue.java:575)

2014-12-15 11:01:17,939 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1418620667348_0001_01
2014-12-15 11:01:17,941 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
application application_1418620667348_0001 with final state: FAILED
{noformat}
# After restart , Final state in RMApp=FAILED and RMAppImpl=null as shown 
below. RM can not recover the applications, and continuously fails. 
{noformat}
2014-12-15 11:01:41,493 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1418620667348_0001 with 1 attempts and final state = FAILED
2014-12-15 11:01:41,494 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1418620667348_0001_01 with final state: null
{noformat}

 NPE thrown when RM restart after queue is STOPPED
 -

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical

 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246555#comment-14246555
 ] 

Rohith commented on YARN-2340:
--

Some thoughts for fixing this issue either of below 2 are
1. Straight away invoke KILL event for application if application is submitting 
into STOPPED queue during recovering applications. KILL event smothly 
transition RMApp/RMAppAttempt to KILLED state. But throw exception while 
killing master container since Either NM's were not registered to RM OR 
Connection Refused when NM is down.
{{CS#addApplication}}
{code}
   // Submit to the queue
try {
  queue.submitApplication(applicationId, user, queueName);
} catch (AccessControlException ace) {
  LOG.info(Failed to submit application  + applicationId +  to queue 
  + queueName +  from user  + user, ace);
  if (isAppRecovering) {
LOG.info(Killing the application  + applicationId);
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.KILL));
  } else {
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, ace.toString()));
  }
  return;
}
{code}

{{CS#addApplicationAttempt}}
{code}
SchedulerApplicationFiCaSchedulerApp application =
applications.get(applicationAttemptId.getApplicationId());
if (application == null  isAttemptRecovering) {
  LOG.info(Attempt is recovering from an application where Queue is 
stopped.
  + applicationAttemptId);
  return;
}
{code}

2. Introduce new event type like APP_RECOVERY_FAILED  or 
APP_SCHEDULER_RECOVERY_FAILED and trigger from Scheduler if app is submitted to 
stopped queue while recovering. Transitions would be like below
AppAttempt : {{NEW to LAUNCHED}}
App : {{NEW to ACCEPTED}}
App : {{ACCEPTED to FINAL_SAVING}} on event APP_RECOVERY_FAILED  or 
APP_SCHEDULER_RECOVERY_FAILED 
AppAttempt : {{LAUNCHED to FINAL_SAVING}}
AppAttempt : {{FINAL_SAVING to FAILED}}
App : {{FINAL_SAVING to FAILED}}

Please give your suggestions/thoughts.


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical

 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247415#comment-14247415
 ] 

Jian He commented on YARN-2340:
---

Today, the semantics to stop a queue is to let the existing applications run 
into completion. We should retain the same semantics for RM restart as well. In 
this case, I think we need to ignore this exception and continue because the 
application was accepted before the queue is changed to stopped. Similar 
problem could happen if we change the application acl and restart RM while 
application is running. 

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical

 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)