[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700679#comment-13700679 ] Hudson commented on YARN-845: - Integrated in Hadoop-Mapreduce-trunk #1478 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1478/]) YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) (Revision 1499886) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Fix For: 2.1.0-beta > > Attachments: rm.log, YARN-845-trunk-1.patch, > YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelCon
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700645#comment-13700645 ] Hudson commented on YARN-845: - Integrated in Hadoop-Hdfs-trunk #1451 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1451/]) YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) (Revision 1499886) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Fix For: 2.1.0-beta > > Attachments: rm.log, YARN-845-trunk-1.patch, > YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelConnector@hos
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700599#comment-13700599 ] Hudson commented on YARN-845: - Integrated in Hadoop-Yarn-trunk #261 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/261/]) YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) (Revision 1499886) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Fix For: 2.1.0-beta > > Attachments: rm.log, YARN-845-trunk-1.patch, > YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelConnector@hostX
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700380#comment-13700380 ] Hudson commented on YARN-845: - Integrated in Hadoop-trunk-Commit #4043 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4043/]) YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) (Revision 1499886) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Attachments: rm.log, YARN-845-trunk-1.patch, > YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelConnector@hostXX:8088 > 2013-06-17 12:43:53,7
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700372#comment-13700372 ] Bikas Saha commented on YARN-845: - Looks good. +1. > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Attachments: rm.log, YARN-845-trunk-1.patch, > YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelConnector@hostXX:8088 > 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > 2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher > (ApplicationMasterLauncher.j
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698357#comment-13698357 ] Hadoop QA commented on YARN-845: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590532/YARN-845-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1418//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1418//console This message is automatically generated. > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Attachments: rm.log, YARN-845-trunk-1.patch, > YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Capac
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698326#comment-13698326 ] Mayank Bansal commented on YARN-845: I had an offline discussion with [~arpitgupta] and [~bikassaha] We are not able to reproduce the issue however we can synchronize the application object on assignreserved containers to make it consistent with another calls. I am adding more logs to find the issue if we can get this crash. I am also sending yean run time exceptions if we get this null again. Thanks, Mayank > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Attachments: rm.log, YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelConnector@hostXX:8088 > 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2013-06-17 12:43:53,767 INFO impl.Metri
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687420#comment-13687420 ] Hadoop QA commented on YARN-845: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12588484/YARN-845-trunk-draft.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1338//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1338//console This message is automatically generated. > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Mayank Bansal > Attachments: rm.log, YARN-845-trunk-draft.patch > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687382#comment-13687382 ] Mayank Bansal commented on YARN-845: [~arpitgupta] I couldn't reproduce this issue, and didn't get any scenarios where this can happen. I just found out one race scenario which could trigger this in NODE_UPDATE. Attaching draft patch but not sure if thats solving this issue but worth of adding the improvement. If you can get me more info how did you reproduce this issue that would be great help. Thanks, Mayank > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta > Attachments: rm.log > > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: hostXX:45454 > #containers=4 available=2048 used=6144 with event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: hostXX:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: hostXX:45454 > #containers=4 available=2048 used=6144, currently has 4 at priority 20; > currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > SelectChannelConnector@hostXX:8088 > 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceM
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686108#comment-13686108 ] Mayank Bansal commented on YARN-845: HI Arpit, Can you please attache the RM logs? Thanks, Mayank > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: > hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144 with > event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: > hor15n00.gq1.ygridcore.net:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: > hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144, > currently has 4 at priority 20; currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > selectchannelconnec...@hor14n33.gq1.ygridcore.net:8088 > 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > 2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher > (ApplicationM
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685988#comment-13685988 ] Arpit Gupta commented on YARN-845: -- Mayank Unfortunately cannot consistently reproduce this. We saw this on a cluster where pig e2e tests were running and started failing because the RM went down. On other nights and other clusters the same set of tests have gone through without any issues. > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: > hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144 with > event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: > hor15n00.gq1.ygridcore.net:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: > hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144, > currently has 4 at priority 20; currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > selectchannelconnec...@hor14n33.gq1.ygridcore.net:8088 > 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2013-06-17 12:43:53,767 INFO impl.Metrics
[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE
[ https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685984#comment-13685984 ] Mayank Bansal commented on YARN-845: Arpit, Can you please update the reproducebale steps? Thanks, Mayank > RM crash with NPE on NODE_UPDATE > > > Key: YARN-845 > URL: https://issues.apache.org/jira/browse/YARN-845 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Arpit Gupta > > the following stack trace is generated in rm > {code} > n, service: 68.142.246.147:45454 }, ] resource= > queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.90625, > absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,655 INFO capacity.ParentQueue > (ParentQueue.java:completedContainer(696)) - completedContainer queue=root > usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster= > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:completedContainer(832)) - Application > appattempt_1371448527090_0844_01 released container > container_1371448527090_0844_01_05 on node: host: > hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144 with > event: FINISHED > 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler > (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for > application application_1371448527090_0844 on node: > hor15n00.gq1.ygridcore.net:45454 > 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:unreserve(435)) - Application > application_1371448527090_0844 unreserved on node host: > hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144, > currently has 4 at priority 20; currentReservation > 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for > deactivate... > 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager > (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to > the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) > at java.lang.Thread.run(Thread.java:662) > 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager > (ResourceManager.java:run(426)) - Exiting, bbye.. > 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > selectchannelconnec...@hor14n33.gq1.ygridcore.net:8088 > 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager > (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion > recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep > interrupted > 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics > system... > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. > 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system > shutdown complete. > 2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher > (Appl