[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:
------------------------------------
    Attachment: YARN-3946.v1.001.patch
                YARN3946_attemptDiagnistic message.png

Hi [~wangda],[~rohithsharma],[~sunilg], [~sumit.nigam] & [~nijel].

As mentioned by Wangda in his 
[comment|https://issues.apache.org/jira/browse/YARN-4091?focusedCommentId=14735266&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735266]
 in YARN-4091, its very difficult to capture to the status when *App's 
leafqueue or parent queue beyond its limit* as it would not be good to loop 
through all the apps in the hierarchy and update the status for each node 
update and also it will loose its imp info from previous updates.

So i think valid cases where we can update AMLaunchDiagnostics in 
SchedulerApplicationAttempt as (ForCS) :
 
* App is in Pending state, AMLimit/userlimit of the queue
* App waiting for resources of partition for AM to be launched (once moved from 
pending state)
* App waiting for resources of partition for AM to be launched Some nodes are 
blacklisted (if it fails to launch because of some black list nodes)
* AMLimit of the queue doesnt allow to launch 
* UserLimit of the queue doesnt allow to launch

Please check if the approach is proper, if its usefull and required then can 
get similar thing done for FairScheduler also. cc/ [~ka...@cloudera.com]

Also have taken the liberty to modify some small issues in 
{{SchedulerApplicationAttempt.isWaitingForAMContainer}} in the same patch if 
required can raise another jira and put these small changes there.


> Allow fetching exact reason as to why a submitted app is in ACCEPTED state.
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic 
> message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to