[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726323#comment-13726323 ] Hudson commented on YARN-502: - SUCCESS: Integrated in Hadoop-Yarn-trunk #288 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/288/]) YARN-502. Fixed a state machine issue with RMNode inside ResourceManager which was crashing scheduler. Contributed by Mayank Bansal. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1509060) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Fix For: 2.1.1-beta Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, YARN-502-trunk-3.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726421#comment-13726421 ] Hudson commented on YARN-502: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1478 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1478/]) YARN-502. Fixed a state machine issue with RMNode inside ResourceManager which was crashing scheduler. Contributed by Mayank Bansal. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1509060) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Fix For: 2.1.1-beta Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, YARN-502-trunk-3.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725913#comment-13725913 ] Vinod Kumar Vavilapalli commented on YARN-502: -- Though the explicit state check is a little unmaintainable if we have new states in future, the current change is less intrusive. The better way could have been creating new transition class, but I'm okay. +1, checking this in. RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, YARN-502-trunk-3.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718197#comment-13718197 ] Hadoop QA commented on YARN-502: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593897/YARN-502-trunk-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1571//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1571//console This message is automatically generated. RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, YARN-502-trunk-3.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702177#comment-13702177 ] Zhijie Shen commented on YARN-502: -- +1, the patch looks good to me RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698418#comment-13698418 ] Mayank Bansal commented on YARN-502: Latest patch does not need rebasing Thanks, Mayank RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698522#comment-13698522 ] Karthik Kambatla commented on YARN-502: --- Looks good to me. +1 RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680527#comment-13680527 ] Sandy Ryza commented on YARN-502: - Thanks [~mayank_bansal], the patch looks good to me. My only nit is that, grammatically, the message should be rmNode.getNodeAddress() + *has* already been removed. RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680545#comment-13680545 ] Hadoop QA commented on YARN-502: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587265/YARN-502-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1192//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1192//console This message is automatically generated. RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680578#comment-13680578 ] Hadoop QA commented on YARN-502: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587280/YARN-502-trunk-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1193//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1193//console This message is automatically generated. RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679941#comment-13679941 ] Mayank Bansal commented on YARN-502: Thanks [~sandyr] I did not reprduce it. As no body is working , let me take a look Thanks, Mayank RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679982#comment-13679982 ] Mayank Bansal commented on YARN-502: By Looking at the code looks like if there is race condition between ReconnectNodeTransition and UnhealthyTrabsntion in event dispatcher This condition may arrise when Nodemanager tries to register itself and ResourceTrackerService puts this node in the Nodes list and schedule the event for recoonect however in the mean time there is an unhealthy event come first to RM and it deletes this Node from the Nodes map. Thanks, Mayank RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678300#comment-13678300 ] Mayank Bansal commented on YARN-502: Hi [~sandyr] Are you working on this? Thanks, Mayank RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678306#comment-13678306 ] Sandy Ryza commented on YARN-502: - [~mayank_bansal], I am not. I didn't know how to reproduce it - have you experienced this as well? RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira