[ https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323332#comment-16323332 ]
Wangda Tan edited comment on YARN-7739 at 1/12/18 1:05 AM: ----------------------------------------------------------- I personally prefer to not update global's maximum allocation by node's availabilities by default and reject requests if it exceeds maximum allocation. Thoughts? [~jlowe] / [~asuresh] / [~sunilg] / [~templedf] / [~yufeigu]. was (Author: leftnoteasy): I personally prefer to not update global's maximum allocation by node's availabilities by default and reject requests if it exceeds maximum allocation. Thoughts? [~jlowe] / [~asuresh] / [~sunilg]. > Revisit scheduler resource normalization behavior for max allocation > -------------------------------------------------------------------- > > Key: YARN-7739 > URL: https://issues.apache.org/jira/browse/YARN-7739 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Wangda Tan > Priority: Critical > > Currently, YARN Scheduler normalizes requested resource based on the maximum > allocation derived from configured maximum allocation and maximum registered > node resources. Basically, the scheduler will silently cap asked resource by > maximum allocation. > This could cause issues for applications, for example, a Spark job which > needs 12 GB memory to run, however in the cluster, registered NMs have at > most 8 GB mem on each node. So scheduler allocates 8GB memory container to > the requested application. > Once app receives containers from RM, if it doesn't double check allocated > resources, it will lead to OOM and hard to debug because scheduler silently > caps maximum allocation. > When non-mandatory resources introduced, this becomes worse. For resources > like GPU, we typically set minimum allocation to 0 since not all nodes have > GPU devices. So it is possible that application asks 4 GPUs but get 0 GPU, it > gonna be a big problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org