[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558371#comment-16558371
 ] 

Szilard Nemeth commented on YARN-8566:
--------------------------------------

Uploaded new patch that fixes the UT failures.

> Add diagnostic message for unschedulable containers
> ---------------------------------------------------
>
>                 Key: YARN-8566
>                 URL: https://issues.apache.org/jira/browse/YARN-8566
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch, 
> YARN-8566.006.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> <allocations>
>   <queueMaxAppsDefault>100000</queueMaxAppsDefault>
> <queue name="sample_queue">
>     <minResources>10000 mb,2vcores</minResources>
>     <maxResources>90000 mb,4vcores, 0gpu</maxResources>
>     <maxRunningApps>50</maxRunningApps>
>     <maxAMShare>-1.0f</maxAMShare>
>     <weight>2.0</weight>
>     <schedulingPolicy>fair</schedulingPolicy>
>   </queue>
> </allocations>
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to