branch-2

Thomas Graves (Jira) Tue, 28 Jan 2020 06:29:24 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025156#comment-17025156
 ]


Thomas Graves commented on YARN-8200:
-------------------------------------

Hey [~jhung] ,

I am trying out the gpu scheduling in hadoop 2.10 and the first thing I noticed 
is it doesn't error properly if you ask for to many GPU's. It seems to happyily 
say it gave them to me, although I think its really giving me the max 
configured.  Is this a known issue already or did configuration change?

I have gpu max configured at 4 and I try to allocate 8, on hadoop 3 I get:

 

Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
 Invalid resource request, requested resource type=[yarn.io/gpu] < 0 or greater 
than maximum allowed allocation. Requested resource=<memory:1408, vCores:1, 
yarn.io/gpu: 8>, maximum allowed allocation=<memory:8192, vCores:4, 
yarn.io/gpu: 4>, please note that maximum allowed allocation is calculated by 
scheduler based on maximum resource of registered NodeManagers, which might be 
less than configured maximum allocation=<memory:8192, vCores:4, yarn.io/gpu: 10>

 

On hadoop 2.10 I get a container allocated but the logs and UI says it only has 
4 gpus. 

> Backport resource types/GPU features to branch-3.0/branch-2
> -----------------------------------------------------------
>
>                 Key: YARN-8200
>                 URL: https://issues.apache.org/jira/browse/YARN-8200
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>            Priority: Major
>              Labels: release-blocker
>             Fix For: 2.10.0
>
>         Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-2.002.patch, YARN-8200-branch-2.003.patch, 
> YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2

Reply via email to