[ 
https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721194#comment-16721194
 ] 

Szilard Nemeth commented on YARN-9120:
--------------------------------------

[~tangzhankun]: I agree that the runtime ability of replace configs would 
introduce more and more complexity into NM, as we would need to handle if the 
GPUs are disabled gracefully, when a container already has some mapping for 
those GPUs.
I'm becoming more convinced that this jira should not happen in its described 
form, so I'm fine not doing that now, given that we already have YARN-8851 as 
on ongoing change.

1. The property value would have been "off"
2. Fair enough
3. I think it's perfectly valid if users want to have heterogeneous clusters so 
that the nodes don't have GPUs are not having the GPU support related 
configurations in yarn-site.xml and in container-executor.cfg.


Two questions:
1. Could you please confirm whether only removing the resource plugin from 
yarn-site.xml and keeping GPU-related config in container-executor.cfg would 
not break anything as a regression on nodes that don't have any GPU?
2. Do you think it's still worth to create a jira under YARN-8851 that handles 
switching the GPU plugin on/off dynamically (at runtime)? If yes, I'm fine 
either creating it or giving this task to you.

Thanks! 

> Need to have a way to turn off GPU auto-discovery in GpuDiscoverer
> ------------------------------------------------------------------
>
>                 Key: YARN-9120
>                 URL: https://issues.apache.org/jira/browse/YARN-9120
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>
> GpuDiscoverer.getGpusUsableByYarn either parses the user-defined GPU devices 
> or should have the value 'auto' (from property: 
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices)
> In some circumstances, users would want to exclude a node from scheduling, so 
> they should have an option to turn off auto-discovery.
> It's straightforward that this is possible by removing the GPU 
> resource-plugin from YARN's config along with GPU-related config in 
> container-executor.cfg, but doing that with a dedicated value for 
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices is a more 
> lightweight approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to