[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2020-07-10 Thread xin qian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155285#comment-17155285
 ] 

xin qian commented on YARN-3001:


Hi, Zhihai Xu , do you know how to reproduce this problem?

> RM dies because of divide by zero
> -
>
> Key: YARN-3001
> URL: https://issues.apache.org/jira/browse/YARN-3001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.5.1
>Reporter: hoelog
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-3001.barnch-2.7.patch
>
>
> RM dies because of divide by zero exception.
> {code}
> 2014-12-31 21:27:05,022 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-31 21:27:05,023 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2017-04-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961908#comment-15961908
 ] 

zhihai xu commented on YARN-3001:
-

We also see this issue at CDH5.7.2 which is based on hadoop 2.6 release + 
patches from hadoop 2.7 release, I studied most of the code paths I found two 
potential corner cases which may cause this issue:
1. maximum allocation can be changed based on node added and removed and total 
resource changed on the node. if maximum allocation is changed to 0 
transiently, this issue may happen. since the following code at 
CapacityScheduler.allocate will change ResourceRequest in ask to 0 if 
getMaximumResourceCapability is 0.
{code}
SchedulerUtils.normalizeRequests(
ask, getResourceCalculator(), getClusterResource(),
getMinimumResourceCapability(), getMaximumResourceCapability());
{code}
2. capability from resource request in application returned without cloning in 
LeafQueue.assignContainer and AppSchedulingInfo.cloneResourceRequest and 
AppSchedulingInfo.getResource, Potentially the capability in resource request 
returned can be changed outside.
I implemented a patch which fixed the first potential corner case based on 
branch-2.7. We already deployed this patch for more than one month, so far we 
didn't see this issue happen with the attached patch.
The stack trace for the exception is 
{code}
2017-02-09 15:36:43,062 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.ArithmeticException: / by zero
at 
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.computeAvailableContainers(DominantResourceCalculator.java:115)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1536)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1392)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1271)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersInternal(LeafQueue.java:830)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:586)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:447)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:586)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:447)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1027)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691)
at java.lang.Thread.run(Thread.java:745)
{code}

> RM dies because of divide by zero
> -
>
> Key: YARN-3001
> URL: https://issues.apache.org/jira/browse/YARN-3001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.5.1
>Reporter: hoelog
>Assignee: Rohith Sharma K S
>
> RM dies because of divide by zero exception.
> {code}
> 2014-12-31 21:27:05,022 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
> at 
> 

[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2016-08-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418495#comment-15418495
 ] 

Heng Chen commented on YARN-3001:
-

We encounter this issue too,  our cluster is 2.5.0,  the resource manager log 
shows:
{code}
2016-08-12 02:06:51,204 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8644_01_000820 Container Transitioned from NEW to 
RESERVED
2016-08-12 02:06:51,204 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Reserved container  application attempt=appattempt_1470798127749_8644_01 
resource= queue=default: capacity=0.65, 
absoluteCapacity=0.65, usedResources=, 
usedCapacity=1.1451658, absoluteUsedCapacity=0.74435765, numApps=19, 
numContainers=1715 node=host: dx-pipe-sata114-pm:38694 #containers=36 
available=19456 used=82944 clusterResource=
2016-08-12 02:06:51,204 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Re-sorting assigned queue: root.default stats: default: capacity=0.65, 
absoluteCapacity=0.65, usedResources=, 
usedCapacity=1.1458336, absoluteUsedCapacity=0.7447917, numApps=19, 
numContainers=1716
2016-08-12 02:06:51,204 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.86067706 
absoluteUsedCapacity=0.86067706 used= 
cluster=
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_41 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_42 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_43 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_44 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_45 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_46 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_47 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_48 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_49 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_50 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_51 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_52 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1470798127749_8660_01_53 Container Transitioned from ALLOCATED to 
ACQUIRED
2016-08-12 02:06:51,223 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.ArithmeticException: / by zero
at 
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.computeAvailableContainers(DominantResourceCalculator.java:101)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
at 

[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-06-22 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595423#comment-14595423
 ] 

Hui Zheng commented on YARN-3001:
-

The only non-INFO log is following(it is so sudden there is not any other WARN 
or ERROR ).
There are several tens thousands of jobs per day.
{code}
2015-06-21 09:53:44,696 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.ArithmeticException: / by zero
at 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1335)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignNodeLocalContainers(LeafQueue.java:1185)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(
ResourceManager.java:557)at java.lang.Thread.run(Thread.java:724)
2015-06-21 09:53:44,696 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith Sharma K S

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-06-21 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595293#comment-14595293
 ] 

Hui Zheng commented on YARN-3001:
-

This problem also happened twice(31/Jan/15 and 20/Jun/15) in our cluster.
We use hadoop-2.2.0 and also set yarn.scheduler.minimum-allocation-mb=3072.

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith Sharma K S

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-06-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595325#comment-14595325
 ] 

Rohith Sharma K S commented on YARN-3001:
-

Hi [~huizane], thanks for reply. 
 Would you please attach the RM logs if you have?

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith Sharma K S

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336083#comment-14336083
 ] 

Rohith commented on YARN-3001:
--

bq. RM does not dies unless yarn.dispatcher.exit-on-error is set to true
Ignore this. RM sets this configuration to true neverthless of configured value.

In YARN-382 ResourceRequest is validated via normalization process. 
Normalization of resources make sure always minimum-allocation-mb for 
containers even if users send 0 as container memory.
I verified in real cluster by sending 0 as container memory and am memory. 
Schedulers normalize the requests and allocates configured 
yarn.scheduler.minimum-allocation-mb.

[~hoelog] Could you give scenario when it happened? 

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-01-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262737#comment-14262737
 ] 

Rohith commented on YARN-3001:
--

In trunk, RM does not dies unless *yarn.dispatcher.exit-on-error* is set to 
true.

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-01-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262732#comment-14262732
 ] 

Rohith commented on YARN-3001:
--

This looks client is sending ResourceRequest with capability 0 memory.

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)