[
https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722443#comment-16722443
]
Zhankun Tang commented on YARN-9120:
------------------------------------
[~snemeth], I double-checked that if we remove "yarn.io/gpu" from property
"nm.resource-plugins", the other GPU related configuration remains there, the
server's GPU resource won't be discovered and used. Which means, GPU is
disabled. And verified that the application requesting GPU will fail. It can
run without requesting GPU resource.
[~pbacsko], Probably it may have no obvious benefit when we add a new "off"
value to "yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices" comparing
to remove "yarn.io/gpu" from "yarn.nodemanager.resource-plugins"? Both ways to
me need the admin to configure different yarn-site.xml in the servers.
I guess your point is on how YARN can manage the configurations on a
heterogeneous cluster?
I'm not sure if Ambari or any tool can have a different configuration for each
node. This seems not YARN's responsibility. [~rohithsharma] , any idea?
> Need to have a way to turn off GPU auto-discovery in GpuDiscoverer
> ------------------------------------------------------------------
>
> Key: YARN-9120
> URL: https://issues.apache.org/jira/browse/YARN-9120
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
>
> GpuDiscoverer.getGpusUsableByYarn either parses the user-defined GPU devices
> or should have the value 'auto' (from property:
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices)
> In some circumstances, users would want to exclude a node from scheduling, so
> they should have an option to turn off auto-discovery.
> It's straightforward that this is possible by removing the GPU
> resource-plugin from YARN's config along with GPU-related config in
> container-executor.cfg, but doing that with a dedicated value for
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices is a more
> lightweight approach.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]