[
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554949#comment-16554949
]
Jonathan Hung commented on YARN-8200:
-------------------------------------
Uploaded scheduler allocation counters for default resources (mem/cpu) and gpu
resources. Also uploaded synth_sls.json configuration used for generating synth
trace (4k nodes, 20k jobs)
SLS simulation using default resources took 2hr 10min, with gpu resources took
2hr 25min. In the gpu SLS simulation we hardcoded each mapper and reducer to
request 1 gpu.
> Backport resource types/GPU features to branch-2
> ------------------------------------------------
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
> Issue Type: Task
> Reporter: Jonathan Hung
> Assignee: Jonathan Hung
> Priority: Major
> Attachments:
> counter.scheduler.operation.allocate.csv.defaultResources,
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support
> deep learning workloads. However, our main production clusters are running
> older versions of branch-2 (2.7 in our case). To prevent supporting too many
> very different hadoop versions across multiple clusters, we would like to
> backport the resource types/resource profiles feature to branch-2, as well as
> the GPU specific support.
>
> We have done a trial backport of YARN-3926 and some miscellaneous patches in
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth.
> We also did a trial backport of most of YARN-6223 (sans docker support).
>
> Regarding the backports, perhaps we can do the development in a feature
> branch and then merge to branch-2 when ready.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]