[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700679#comment-13700679
 ] 

Hudson commented on YARN-845:
-

Integrated in Hadoop-Mapreduce-trunk #1478 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1478/])
YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) 
(Revision 1499886)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Fix For: 2.1.0-beta
>
> Attachments: rm.log, YARN-845-trunk-1.patch, 
> YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelCon

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700645#comment-13700645
 ] 

Hudson commented on YARN-845:
-

Integrated in Hadoop-Hdfs-trunk #1451 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1451/])
YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) 
(Revision 1499886)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Fix For: 2.1.0-beta
>
> Attachments: rm.log, YARN-845-trunk-1.patch, 
> YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@hos

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700599#comment-13700599
 ] 

Hudson commented on YARN-845:
-

Integrated in Hadoop-Yarn-trunk #261 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/261/])
YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) 
(Revision 1499886)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Fix For: 2.1.0-beta
>
> Attachments: rm.log, YARN-845-trunk-1.patch, 
> YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@hostX

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700380#comment-13700380
 ] 

Hudson commented on YARN-845:
-

Integrated in Hadoop-trunk-Commit #4043 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4043/])
YARN-845. RM crash with NPE on NODE_UPDATE (Mayank Bansal via bikas) 
(Revision 1499886)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499886
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Attachments: rm.log, YARN-845-trunk-1.patch, 
> YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@hostXX:8088
> 2013-06-17 12:43:53,7

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700372#comment-13700372
 ] 

Bikas Saha commented on YARN-845:
-

Looks good. +1.

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Attachments: rm.log, YARN-845-trunk-1.patch, 
> YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@hostXX:8088
> 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2013-06-17 12:43:53,768 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.j

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698357#comment-13698357
 ] 

Hadoop QA commented on YARN-845:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12590532/YARN-845-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1418//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1418//console

This message is automatically generated.

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Attachments: rm.log, YARN-845-trunk-1.patch, 
> YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Capac

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698326#comment-13698326
 ] 

Mayank Bansal commented on YARN-845:


I had an offline discussion with [~arpitgupta] and [~bikassaha] 

We are not able to reproduce the issue however we can synchronize the 
application object on assignreserved containers to make it consistent with 
another calls.
I am adding more logs to find the issue if we can get this crash. 

I am also sending yean run time exceptions if we get this null again.

Thanks,
Mayank

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Attachments: rm.log, YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@hostXX:8088
> 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2013-06-17 12:43:53,767 INFO  impl.Metri

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687420#comment-13687420
 ] 

Hadoop QA commented on YARN-845:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12588484/YARN-845-trunk-draft.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1338//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1338//console

This message is automatically generated.

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>Assignee: Mayank Bansal
> Attachments: rm.log, YARN-845-trunk-draft.patch
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-06-18 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687382#comment-13687382
 ] 

Mayank Bansal commented on YARN-845:


[~arpitgupta]

I couldn't reproduce this issue, and didn't get any scenarios where this can 
happen.

I just found out one race scenario which could trigger this in NODE_UPDATE.

Attaching draft patch but not sure if thats solving this issue but worth of 
adding the improvement.

If you can get me more info how did you reproduce this issue that would be 
great help.

Thanks,
Mayank

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
> Attachments: rm.log
>
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: hostXX:45454 
> #containers=4 available=2048 used=6144 with event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: hostXX:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: hostXX:45454 
> #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
> currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> SelectChannelConnector@hostXX:8088
> 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceM

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-06-17 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686108#comment-13686108
 ] 

Mayank Bansal commented on YARN-845:


HI Arpit,

Can you please attache the RM logs?

Thanks,
Mayank

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: 
> hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144 with 
> event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: 
> hor15n00.gq1.ygridcore.net:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: 
> hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144, 
> currently has 4 at priority 20; currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> selectchannelconnec...@hor14n33.gq1.ygridcore.net:8088
> 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2013-06-17 12:43:53,768 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationM

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-06-17 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685988#comment-13685988
 ] 

Arpit Gupta commented on YARN-845:
--

Mayank

Unfortunately cannot consistently reproduce this. We saw this on a cluster 
where pig e2e tests were running and started failing because the RM went down. 
On other nights and other clusters the same set of tests have gone through 
without any issues.

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: 
> hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144 with 
> event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: 
> hor15n00.gq1.ygridcore.net:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: 
> hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144, 
> currently has 4 at priority 20; currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> selectchannelconnec...@hor14n33.gq1.ygridcore.net:8088
> 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2013-06-17 12:43:53,767 INFO  impl.Metrics

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-06-17 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685984#comment-13685984
 ] 

Mayank Bansal commented on YARN-845:


Arpit,

Can you please update the reproducebale steps?

Thanks,
Mayank

> RM crash with NPE on NODE_UPDATE
> 
>
> Key: YARN-845
> URL: https://issues.apache.org/jira/browse/YARN-845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Arpit Gupta
>
> the following stack trace is generated in rm
> {code}
> n, service: 68.142.246.147:45454 }, ] resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.90625, 
> absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
> usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used= vCores:29> cluster=
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:completedContainer(832)) - Application 
> appattempt_1371448527090_0844_01 released container 
> container_1371448527090_0844_01_05 on node: host: 
> hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144 with 
> event: FINISHED
> 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
> application application_1371448527090_0844 on node: 
> hor15n00.gq1.ygridcore.net:45454
> 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:unreserve(435)) - Application 
> application_1371448527090_0844 unreserved  on node host: 
> hor15n00.gq1.ygridcore.net:45454 #containers=4 available=2048 used=6144, 
> currently has 4 at priority 20; currentReservation 
> 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
> (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
> deactivate...
> 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
> at java.lang.Thread.run(Thread.java:662)
> 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(426)) - Exiting, bbye..
> 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> selectchannelconnec...@hor14n33.gq1.ygridcore.net:8088
> 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2013-06-17 12:43:53,768 WARN  amlauncher.ApplicationMasterLauncher 
> (Appl