[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163906#comment-16163906
 ] 

Wangda Tan commented on YARN-6620:
----------------------------------

[~devaraj.k],

Thanks for reviewing the patch. 

Only several questions/comments: 
bq. 1. XML file reading in GpuDeviceInformationParser.java, can we use the 
existing libraries like javax.xml.bind.JAXBContext to unmarshall the XML 
document to a Java Object instead of reading tag by tag?
My understanding of JAXBContext is mostly used when we need to convert between 
object and XML/JSON. Since output of nvidia-smi is a customized XML format, 
which doesn't follow JAXB standard. Is it still best practice to use 
JAXBContext under such use case? For example, FairScheduler parses XML file 
directly: {{AllocationFileLoaderService#reloadAllocations}}. 

bq. 3. Instead of hardcoding the BINARY_NAME, can it be included as part of 
DEFAULT_NM_GPU_PATH_TO_EXEC as a default value, so that it can be also becomes 
configurable if incase users want to change it.
I considered this option before, unless there's strong need for this to run 
different command or call Nvidia native APIs directly, I would prefer to hard 
code to use nvidia-smi instead of introducing another abstraction layer. I'm 
open to do refactoring to support this case once we have such requirements.

bq. 5. Can we use spaces instead of tab characters for indentation in 
nvidia-smi-sample-output.xml?
This is directly copy from nvidia-smi output, the major reason is to make sure 
we can properly parse real commandline output, so I prefer to keep it as-is.

bq. 6. Are we going to support multiple containers/processes(limited number) 
sharing the same GPU device?
No, since no proper isolation can be done for this, I don't plan to support 
this as of now.

Since YARN-3926 just get merged to trunk, I will add code to support 
reading/specifying GPU value in resource object. I will address all comments in 
the next update. 

> [YARN-6223] NM Java side code changes to support isolate GPU devices by using 
> CGroups
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to