[
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025285#comment-17025285
]
Thomas Graves commented on YARN-8200:
-------------------------------------
After messing with this a bit more I removed the maximum allocation
configurations after seeing the documentation didn't have them in the 2.10
release. so removed this setting:
<property>
<name>yarn.resource-types.yarn.io/gpu.maximum-allocation</name>
<value>4</value>
</property>
And it appears now yarn doesn't allocate me a container unless it has
fullfilled all of the gpus I requested. So in this case my nodemanager has 4
gpus so if I request 5 then it just hangs waiting to fullfill the request. This
behavior is much better then giving me one that is less then I requested.
> Backport resource types/GPU features to branch-3.0/branch-2
> -----------------------------------------------------------
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
> Issue Type: Task
> Reporter: Jonathan Hung
> Assignee: Jonathan Hung
> Priority: Major
> Labels: release-blocker
> Fix For: 2.10.0
>
> Attachments: YARN-8200-branch-2.001.patch,
> YARN-8200-branch-2.002.patch, YARN-8200-branch-2.003.patch,
> YARN-8200-branch-3.0.001.patch,
> counter.scheduler.operation.allocate.csv.defaultResources,
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support
> deep learning workloads. However, our main production clusters are running
> older versions of branch-2 (2.7 in our case). To prevent supporting too many
> very different hadoop versions across multiple clusters, we would like to
> backport the resource types/resource profiles feature to branch-2, as well as
> the GPU specific support.
>
> We have done a trial backport of YARN-3926 and some miscellaneous patches in
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth.
> We also did a trial backport of most of YARN-6223 (sans docker support).
>
> Regarding the backports, perhaps we can do the development in a feature
> branch and then merge to branch-2 when ready.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]