Naganarasimha G R updated YARN-3946:
    Attachment: 3946WebImages.zip

Thanks for the quick feedback [~wangda]
bq. AM launch diagnostics should have an intial value after added to scheduler: 
Initially thought of adding this message but the problem is 
{{LeafQueue.activateApplications}} will be immediately called by C.S in 
{{addApplicationAttempt}} hence the messages will be replaced very fast, hence 
initial message will not be helpfull but have ensured the related details are 
captured. Thoughts?

bq.Not caused by your patch, isWaitingForAMContainer checks if master container 
created, you may also need to check if application is in recover state or not. 
Because AM could contact to RM before AM container recovered by RM.
I am not sure i got this correctly
# ??AM could contact to RM before AM container recovered by RM?? failed to 
understand the impact of this, all the required information is restored from 
the RMState store ({{RMAppAttemptImpl.recover(RMState)}} sets the 
mastercontainer from the store) , so after the services are started there is a 
possibility of AM hearbeat to be earlier than NM heartbeat, but what impact 
could it have? Correct me if my understanding is wrong !
# ??check if application is in recover state or not?? not sure how to do this 
if req!, i went through RMAppAttemptImpl and RMAppImpl there was no such 
methods or internal state which can expose this. May be i am missing something 

bq. Suggest to add to REST API / web UI together with this patch if changes are 
not complex.
Even earlier Implementation also had captured it as part of 
attempt.getDiagnostics, so it will be available in all the interfaces

Other comments have handled, Have attached the web images 

bq. I'd like to see this in application reports, so that client-side 
applications can display the details
Have taken care in this patch

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> --------------------------------------------------------------------------------
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN3946_attemptDiagnistic message.png
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.

This message was sent by Atlassian JIRA

Reply via email to