[
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208590#comment-16208590
]
Wangda Tan commented on YARN-6620:
----------------------------------
[~tangzhankun],
I may not make it clear: what I meant is GPU should be a first-class resource
instead of mandatory resource. To me the only mandatory resource for now is
memory and vcores, in the future we might add network/disk as mandatory
resource.
The definition of mandatory resource: in order to run process, mandatory
resource is must required.
The definition of first class resource: Officially supported by YARN.
For your questions.
bq. 1. First-class resource should be parsed from resource-types.xml and
node-resources.xml(or auto discover) instead of yarn-site.xml?
To me, for all resources beyond memory/vcores (because of historical reason),
they should be defined in resource-types.xml and node-resources.xml regardless
if it is a mandatory or first-class.
bq. 2. First-calss resource handler should register itself with the same
resource name defined in xml files?
To me this is true when resource isolation on NM side is required, all
first-class resource should started with "yarn.io/" namespace.
bq. 3. First-class resource should be shown in a separate user-defined column
in web pages?
I'm not sure about this, in the future we may add more and more first-class /
mandatory resources, it might be too much if we add columns for every new
resources we added. To me the ideal solution is user can select and filter
columns in web UI (support this in new UI should be a trivial task).
> Add support in NodeManager to isolate GPU devices by using CGroups
> ------------------------------------------------------------------
>
> Key: YARN-6620
> URL: https://issues.apache.org/jira/browse/YARN-6620
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Fix For: 3.1.0
>
> Attachments: YARN-6620.001.patch, YARN-6620.002.patch,
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch,
> YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch,
> YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch,
> YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch,
> YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]