[
https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721237#comment-16721237
]
Zhankun Tang commented on YARN-9120:
------------------------------------
{quote}1. Could you please confirm whether only removing the resource plugin
from yarn-site.xml and keeping GPU-related config in container-executor.cfg
would not break anything as a regression on nodes that don't have any GPU?
{quote}
Zhankun => Yeah. The plugin instance won't be added into ResourceHandlerChain
so any GPU related code shouldn't be executed. I'll double-check next week.
{quote}2. Do you think it's still worth to create a jira under YARN-8851 that
handles switching the GPU plugin on/off dynamically (at runtime)? If yes, I'm
fine either creating it or giving this task to you.
{quote}
Zhankun => The YARN-8851 is not mature yet. Once it's muture, we can port
existing GPU plugin to that new framework. In that case, the new GPU plugin
could be more flexible. For instance, report 0 devices based on its own
policy/configuration. And YARN doesn't need to manage specific vendor device
plugin configurations.
For support the disabling GPU devices in a node at runtime, I'm not quit sure
what's the scenarion it fits in. Any idea? [~snemeth], [~leftnoteasy]
> Need to have a way to turn off GPU auto-discovery in GpuDiscoverer
> ------------------------------------------------------------------
>
> Key: YARN-9120
> URL: https://issues.apache.org/jira/browse/YARN-9120
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
>
> GpuDiscoverer.getGpusUsableByYarn either parses the user-defined GPU devices
> or should have the value 'auto' (from property:
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices)
> In some circumstances, users would want to exclude a node from scheduling, so
> they should have an option to turn off auto-discovery.
> It's straightforward that this is possible by removing the GPU
> resource-plugin from YARN's config along with GPU-related config in
> container-executor.cfg, but doing that with a dedicated value for
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices is a more
> lightweight approach.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]