[
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163906#comment-16163906
]
Wangda Tan commented on YARN-6620:
----------------------------------
[~devaraj.k],
Thanks for reviewing the patch.
Only several questions/comments:
bq. 1. XML file reading in GpuDeviceInformationParser.java, can we use the
existing libraries like javax.xml.bind.JAXBContext to unmarshall the XML
document to a Java Object instead of reading tag by tag?
My understanding of JAXBContext is mostly used when we need to convert between
object and XML/JSON. Since output of nvidia-smi is a customized XML format,
which doesn't follow JAXB standard. Is it still best practice to use
JAXBContext under such use case? For example, FairScheduler parses XML file
directly: {{AllocationFileLoaderService#reloadAllocations}}.
bq. 3. Instead of hardcoding the BINARY_NAME, can it be included as part of
DEFAULT_NM_GPU_PATH_TO_EXEC as a default value, so that it can be also becomes
configurable if incase users want to change it.
I considered this option before, unless there's strong need for this to run
different command or call Nvidia native APIs directly, I would prefer to hard
code to use nvidia-smi instead of introducing another abstraction layer. I'm
open to do refactoring to support this case once we have such requirements.
bq. 5. Can we use spaces instead of tab characters for indentation in
nvidia-smi-sample-output.xml?
This is directly copy from nvidia-smi output, the major reason is to make sure
we can properly parse real commandline output, so I prefer to keep it as-is.
bq. 6. Are we going to support multiple containers/processes(limited number)
sharing the same GPU device?
No, since no proper isolation can be done for this, I don't plan to support
this as of now.
Since YARN-3926 just get merged to trunk, I will add code to support
reading/specifying GPU value in resource object. I will address all comments in
the next update.
> [YARN-6223] NM Java side code changes to support isolate GPU devices by using
> CGroups
> -------------------------------------------------------------------------------------
>
> Key: YARN-6620
> URL: https://issues.apache.org/jira/browse/YARN-6620
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-6620.001.patch, YARN-6620.002.patch,
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]