[ 
https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323332#comment-16323332
 ] 

Wangda Tan edited comment on YARN-7739 at 1/12/18 1:05 AM:
-----------------------------------------------------------

I personally prefer to not update global's maximum allocation by node's 
availabilities by default and reject requests if it exceeds maximum allocation.

Thoughts? [~jlowe] / [~asuresh] / [~sunilg] / [~templedf] / [~yufeigu].


was (Author: leftnoteasy):
I personally prefer to not update global's maximum allocation by node's 
availabilities by default and reject requests if it exceeds maximum allocation.

Thoughts? [~jlowe] / [~asuresh] / [~sunilg].

> Revisit scheduler resource normalization behavior for max allocation
> --------------------------------------------------------------------
>
>                 Key: YARN-7739
>                 URL: https://issues.apache.org/jira/browse/YARN-7739
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Priority: Critical
>
> Currently, YARN Scheduler normalizes requested resource based on the maximum 
> allocation derived from configured maximum allocation and maximum registered 
> node resources. Basically, the scheduler will silently cap asked resource by 
> maximum allocation.
> This could cause issues for applications, for example, a Spark job which 
> needs 12 GB memory to run, however in the cluster, registered NMs have at 
> most 8 GB mem on each node. So scheduler allocates 8GB memory container to 
> the requested application.
> Once app receives containers from RM, if it doesn't double check allocated 
> resources, it will lead to OOM and hard to debug because scheduler silently 
> caps maximum allocation.
> When non-mandatory resources introduced, this becomes worse. For resources 
> like GPU, we typically set minimum allocation to 0 since not all nodes have 
> GPU devices. So it is possible that application asks 4 GPUs but get 0 GPU, it 
> gonna be a big problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to