[jira] [Commented] (YARN-7528) Resource types that use units need to be defined at RM level and NM level or when using small units you will overflow max_allocation calculation

Szilard Nemeth (JIRA) Tue, 16 Jan 2018 05:45:18 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327129#comment-16327129
 ]


Szilard Nemeth commented on YARN-7528:
--------------------------------------

Tried several things to reproduce this issue without success, please see the 
summary of my findings below.
 *Actually the only way I was able to reproduce is to put a custom resource 
type with a value equals to {{Long.MAX_VALUE}} to the node-resources.xml, but 
it does not seem a valid configuration for me, moreover it is different what 
this issue describes so that the default configuration causes the overflow.*

Based on the exception message and the method signature of

{{org.apache.hadoop.yarn.util.UnitsConversionUtil.convert}}, {{fromUnit}} 
should be empty, {{toUnit}} should be "m" and {{fromValue}} should equal to 
{{Long.MAX_VALUE}}.

Based on the stacktrace, the call to {{UnitsConversionUtil.convert}} comes from 
{{DominantResourceCalculator.normalize:444}}, this is the call:
{code:java}
long maximumValue = UnitsConversionUtil.convert(
          maximumResourceInformation.getUnits(),
          rResourceInformation.getUnits(),
          maximumResourceInformation.getValue());
{code}
>From these, I tried to track down how the call to 
>{{maximumResourceInformation.getValue()}} could return {{Long.MAX_VALUE}}.

Please see the steps I took by checking the original stack trace and the call 
hierarchy:
 1. {{FairScheduler.getNormalizedResource()}}: the 3rd parameter here is the 
{{getMaximumResourceCapability()}} call.

2. 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler#getMaximumResourceCapability()}}
 that calls to 
 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker#getMaxAllowedAllocation}}

3. In the {{ClusterNodeTracker#getMaxAllowedAllocation}} method, with the 
default configuration, the {{Resource}} being returned contains 
{{ResourceInformations}} where their value is maximized by the 
{{maxAllocation}} field, so in theory, one element of this array (a value of a 
resource) should be {{Long.MAX_VALUE}}.

4. The {{ClusterNodeTracker.maxAllocation}} field is updated in: 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker#updateMaxResources}}.
 Looking at the source of this method, the {{maxAllocation}} array takes its 
values from the {{node.getTotalResource()}}, so I tried to track down how the 
{{SchedulerNode}}'s {{totalResource}} could take this high value.
 Since all the implementation classes just calling the constructor of the 
abstract class, I started to investigate towards this direction.

5. {{SchedulerNode}}'s constructor, relevant line:
{code:java}
this.totalResource = Resources.clone(node.getTotalCapability());
{code}
{{Node}} is an instance of {{RMNode}}, looking how 
{{RMNodeImpl.getTotalCapability()}} works, checked where {{RMNodeImpl}} is 
created.

6. {{RMNodeImpl}} is created in: 
{{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService#registerNodeManager}}
 Since the {{totalCapability}} field of {{RMNodeImpl}} is updated in many 
places, I tried to filter the most relevant one, so just checked the 
constructor. 
 It would have been very hard to check every scenario where this field could be 
updated.
 Still in {{registerNodeManager}}, I saw that the {{totalCapability}} is set 
from the {{RegisterNodeManagerRequest}} at the beginning of the method:
{code:java}
Resource capability = request.getResource();
{code}
This is the boundary of the {{ResourceManager}} because the 
{{RegisterNodeManagerRequest}} is sent from NM to RM.

7. Checked where the {{RegisterNodeManagerRequest}} is created, found only one 
occurence: 
 
{{org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl#registerWithRM}}
 creates the request.
 This is the call:
{code:java}
RegisterNodeManagerRequest.newInstance(nodeId, httpPort, totalResource,
              nodeManagerVersionId, containerReports, getRunningApplications(),
              nodeLabels, physicalResource);
{code}
The relevant field is the {{totalResource}}, so checked where this field is 
updated.

8. In 
{{org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl#serviceInit}},
 {{totalResource}} is updated with:
{code:java}
this.totalResource = NodeManagerHardwareUtils.getNodeResources(conf);
{code}
9. Looking at the implementation of 
{{NodeManagerHardwareUtils.getNodeResources()}}: 
 This is the method that reads from node-resources.xml and sets the node's 
total resources.
 I could not find a scenario where a value of any custom resource is set to 
{{Long.MAX_VALUE}}.
----
To be able to reproduce, I tried several combinations of resource settings with 
node-resources.xml, I have always started the pi job via console.
 Example parameters:
{code:java}
 "pi -Dmapreduce.framework.name=yarn -Dmapreduce.map.resource.gpu=5000m 10 100".
{code}
Please note that sometimes I used different values for the 
-Dmapreduce.map.resource parameter.
 In all cases I used this resource-types.xml file:
{code:xml}
<configuration>
        <property>
           <name>yarn.resource-types</name>
           <value>gpu,fpga</value>
        </property>
</configuration>
{code}
The scenarios I tried:

*1. node-resources.xml: gpu defined as Long.MAX_VALUE --> exception same as in 
issue, hangs*
{code:xml}
<property>
   <name>yarn.nodemanager.resource-type.gpu</name>
   <value>9223372036854775807</value>
</property>
{code}
jobs parameter:
 A.) -Dmapreduce.map.resource.gpu=5000m --> hangs
 B.) -Dmapreduce.map.resource.fpga=5000m --> does not hang

2. node-resources.xml: no custom types defined --> no exception, does not hang

3. node-resources.xml: fpga defined with value 1
{code:xml}
<property>
   <name>yarn.nodemanager.resource-type.fpga</name>
   <value>1</value>
</property>
{code}
jobs parameter:
 A.) -Dmapreduce.map.resource.gpu=5000m --> does not hang
 B.) -Dmapreduce.map.resource.fpga=5000m --> does not hang

4. node-resources.xml: gpu defined with value 1
{code:xml}
<property>
   <name>yarn.nodemanager.resource-type.gpu</name>
   <value>1</value>
</property>
{code}
jobs parameter:
 A.) -Dmapreduce.map.resource.gpu=5000m --> does not hang
 B.) -Dmapreduce.map.resource.fpga=5000m --> does not hang

5. node-resources.xml: gpu defined without value
{code:xml}
<property>
   <name>yarn.nodemanager.resource-type.gpu</name>
</property>
{code}
jobs parameter:
 A.) -Dmapreduce.map.resource.gpu=5000m --> does not hang
 B.) -Dmapreduce.map.resource.fpga=5000m --> does not hang

> Resource types that use units need to be defined at RM level and NM level or 
> when using small units you will overflow max_allocation calculation
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-7528
>                 URL: https://issues.apache.org/jira/browse/YARN-7528
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: documentation, resourcemanager
>    Affects Versions: 3.0.0
>            Reporter: Grant Sohn
>            Assignee: Szilard Nemeth
>            Priority: Major
>
> When the unit is not defined in the RM, the LONG_MAX default will overflow in 
> the conversion step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-7528) Resource types that use units need to be defined at RM level and NM level or when using small units you will overflow max_allocation calculation

Reply via email to