[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208590#comment-16208590
 ] 

Wangda Tan commented on YARN-6620:
----------------------------------

[~tangzhankun], 

I may not make it clear: what I meant is GPU should be a first-class resource 
instead of mandatory resource. To me the only mandatory resource for now is 
memory and vcores, in the future we might add network/disk as mandatory 
resource.

The definition of mandatory resource: in order to run process, mandatory 
resource is must required.
The definition of first class resource: Officially supported by YARN.

For your questions.

bq. 1. First-class resource should be parsed from resource-types.xml and 
node-resources.xml(or auto discover) instead of yarn-site.xml?
To me, for all resources beyond memory/vcores (because of historical reason), 
they should be defined in resource-types.xml and node-resources.xml regardless 
if it is a mandatory or first-class.

bq. 2. First-calss resource handler should register itself with the same 
resource name defined in xml files?
To me this is true when resource isolation on NM side is required, all 
first-class resource should started with "yarn.io/" namespace. 

bq. 3. First-class resource should be shown in a separate user-defined column 
in web pages?
I'm not sure about this, in the future we may add more and more first-class / 
mandatory resources, it might be too much if we add columns for every new 
resources we added. To me the ideal solution is user can select and filter 
columns in web UI (support this in new UI should be a trivial task).

> Add support in NodeManager to isolate GPU devices by using CGroups
> ------------------------------------------------------------------
>
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>             Fix For: 3.1.0
>
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, 
> YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, 
> YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, 
> YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, 
> YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to