[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558163#comment-16558163
 ] 

Szilard Nemeth commented on YARN-8566:
--------------------------------------

Hi [~rkanter]!
Thanks for the quick review, see my new patch with the fixes.
1. Fixed
2. I would leave as it is, as the exception is passed to LOG.warn, so the 
message will be printed anyway. Do you agree with this?
3. Good point, I reused the exception message in 
{{DefaultAMSProcessor.handleInvalidResourceException}}, but I would like to 
keep {{InvalidResourceType}} for the purpose of deciding about updating the 
diagnostics message or not.
I would only like to update the message if the {{InvalidResourceException}} is 
created because of the resource was less than zero or greater than the maximum 
allocation. As this exception is created in other parts of the code for other 
reasons, I would not touch the diagnostic message for those cases.
About {{SchedulerUtils.throwInvalidResourceException}}: I wanted to keep the 
details on how the {{InvalidResourceException}} is created instead of providing 
the message from the callers so this is why I do the formatting of the 
exception message with this method.


> Add diagnostic message for unschedulable containers
> ---------------------------------------------------
>
>                 Key: YARN-8566
>                 URL: https://issues.apache.org/jira/browse/YARN-8566
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> <allocations>
>   <queueMaxAppsDefault>100000</queueMaxAppsDefault>
> <queue name="sample_queue">
>     <minResources>10000 mb,2vcores</minResources>
>     <maxResources>90000 mb,4vcores, 0gpu</maxResources>
>     <maxRunningApps>50</maxRunningApps>
>     <maxAMShare>-1.0f</maxAMShare>
>     <weight>2.0</weight>
>     <schedulingPolicy>fair</schedulingPolicy>
>   </queue>
> </allocations>
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to