[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-08-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726323#comment-13726323
 ] 

Hudson commented on YARN-502:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #288 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/288/])
YARN-502. Fixed a state machine issue with RMNode inside ResourceManager which 
was crashing scheduler. Contributed by Mayank Bansal. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1509060)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Fix For: 2.1.1-beta

 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, 
 YARN-502-trunk-3.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-08-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726421#comment-13726421
 ] 

Hudson commented on YARN-502:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1478 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1478/])
YARN-502. Fixed a state machine issue with RMNode inside ResourceManager which 
was crashing scheduler. Contributed by Mayank Bansal. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1509060)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Fix For: 2.1.1-beta

 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, 
 YARN-502-trunk-3.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-31 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725913#comment-13725913
 ] 

Vinod Kumar Vavilapalli commented on YARN-502:
--

Though the explicit state check is a little unmaintainable if we have new 
states in future, the current change is less intrusive. The better way could 
have been creating new transition class, but I'm okay.

+1, checking this in.

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, 
 YARN-502-trunk-3.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718197#comment-13718197
 ] 

Hadoop QA commented on YARN-502:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593897/YARN-502-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1571//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1571//console

This message is automatically generated.

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, 
 YARN-502-trunk-3.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702177#comment-13702177
 ] 

Zhijie Shen commented on YARN-502:
--

+1, the patch looks good to me

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698418#comment-13698418
 ] 

Mayank Bansal commented on YARN-502:


Latest patch does not need rebasing

Thanks,
Mayank

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698522#comment-13698522
 ] 

Karthik Kambatla commented on YARN-502:
---

Looks good to me. +1

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680527#comment-13680527
 ] 

Sandy Ryza commented on YARN-502:
-

Thanks [~mayank_bansal], the patch looks good to me.  My only nit is that, 
grammatically, the message should be
rmNode.getNodeAddress() +  *has* already been removed.

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680545#comment-13680545
 ] 

Hadoop QA commented on YARN-502:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12587265/YARN-502-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1192//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1192//console

This message is automatically generated.

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680578#comment-13680578
 ] 

Hadoop QA commented on YARN-502:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12587280/YARN-502-trunk-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1193//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1193//console

This message is automatically generated.

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-10 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679941#comment-13679941
 ] 

Mayank Bansal commented on YARN-502:


Thanks [~sandyr]

I did not reprduce it.

As no body is working , let me take a look

Thanks,
Mayank

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu

 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-10 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679982#comment-13679982
 ] 

Mayank Bansal commented on YARN-502:


By Looking at the code looks like if there is race condition between 
ReconnectNodeTransition and UnhealthyTrabsntion in event dispatcher 

This condition may arrise when Nodemanager tries to register itself and 
ResourceTrackerService puts this node in the Nodes list and schedule the event 
for recoonect however in the mean time there is an unhealthy event come first 
to RM and it deletes this Node from the Nodes map.

Thanks,
Mayank

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal

 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-07 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678300#comment-13678300
 ] 

Mayank Bansal commented on YARN-502:


Hi [~sandyr] 

Are you working on this?

Thanks,
Mayank

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu

 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-06-07 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678306#comment-13678306
 ] 

Sandy Ryza commented on YARN-502:
-

[~mayank_bansal], I am not. I didn't know how to reproduce it - have you 
experienced this as well?

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu

 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira