[
https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721194#comment-16721194
]
Szilard Nemeth commented on YARN-9120:
--------------------------------------
[~tangzhankun]: I agree that the runtime ability of replace configs would
introduce more and more complexity into NM, as we would need to handle if the
GPUs are disabled gracefully, when a container already has some mapping for
those GPUs.
I'm becoming more convinced that this jira should not happen in its described
form, so I'm fine not doing that now, given that we already have YARN-8851 as
on ongoing change.
1. The property value would have been "off"
2. Fair enough
3. I think it's perfectly valid if users want to have heterogeneous clusters so
that the nodes don't have GPUs are not having the GPU support related
configurations in yarn-site.xml and in container-executor.cfg.
Two questions:
1. Could you please confirm whether only removing the resource plugin from
yarn-site.xml and keeping GPU-related config in container-executor.cfg would
not break anything as a regression on nodes that don't have any GPU?
2. Do you think it's still worth to create a jira under YARN-8851 that handles
switching the GPU plugin on/off dynamically (at runtime)? If yes, I'm fine
either creating it or giving this task to you.
Thanks!
> Need to have a way to turn off GPU auto-discovery in GpuDiscoverer
> ------------------------------------------------------------------
>
> Key: YARN-9120
> URL: https://issues.apache.org/jira/browse/YARN-9120
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
>
> GpuDiscoverer.getGpusUsableByYarn either parses the user-defined GPU devices
> or should have the value 'auto' (from property:
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices)
> In some circumstances, users would want to exclude a node from scheduling, so
> they should have an option to turn off auto-discovery.
> It's straightforward that this is possible by removing the GPU
> resource-plugin from YARN's config along with GPU-related config in
> container-executor.cfg, but doing that with a dedicated value for
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices is a more
> lightweight approach.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]