[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216012#comment-16216012
 ] 

Jonathan Hung commented on YARN-6620:
-------------------------------------

Not sure I follow the naming convention of GPU resource. I ran into some issues 
when trying to initialize GPU capability for a nodemanager. It seems 
NodeManagerHardwareUtils#getNodeResources is responsible for getting a node's 
total resources. But when it tries to parse it, 
ResourceUtils#addResourceInformation uses {noformat}    String[] parts = 
prop.split("\\.");
    LOG.info("Found resource entry " + prop);
    if (parts.length == 4) {
      String resourceType = parts[3];
      if (!nodeResources.containsKey(resourceType)) {
        nodeResources
            .put(resourceType, ResourceInformation.newInstance(resourceType));
      }
      String units = getUnits(value);
      Long resourceValue =
          Long.valueOf(value.substring(0, value.length() - units.length()));
      nodeResources.get(resourceType).setValue(resourceValue);
      nodeResources.get(resourceType).setUnits(units);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Setting value for resource type " + resourceType + " to "
            + resourceValue + " with units " + units);
      }
    }{noformat}
for this. But since the resource name for GPU ({{yarn.io/gpu}}) has a "." in 
it, it's not parsing correctly.

The configuration set in {{node-resources.xml}} was 
{{yarn.nodemanager.resource-type.yarn.io/gpu}}.

Perhaps the GPU_URI should be renamed? Or the parsing logic should be changed. 
(Or I have something misconfigured.)

> Add support in NodeManager to isolate GPU devices by using CGroups
> ------------------------------------------------------------------
>
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>             Fix For: 3.1.0
>
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, 
> YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, 
> YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, 
> YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, 
> YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to