Wangda Tan commented on YARN-3946:

1) Is it possible to merge amLaunchDiagnostics and other diagnostics? Which can 
simplify RMAppAttemptImpl implementation.
2) Could you take a look at my previous comment?
bq. Since RMAppAttempt and SchedulerApplicationAttempt has 1 to 1 relationship, 
we can save a reference to RMAppAttemt in SchedulerApplicationAttempt, which 
could avoid getting it from RMContext.getRMApps()...

3) I feel this may not needed (no code change needed for you latest patch)
bq. Since String is immutable, amLaunchDiagnostics could be violate so we don't 
need acquire locks.
Since currently createApplicationAttemptReport has a big readLock, we don't 
need to spend extra time for the volatile.

4) Suggestions about diagnostic message:
- Have an internal field to record when is the latest update for the app. We 
can print it with diagnostic message to say, {{\[23 sec before\] <message>}}. 
- And we can use above field to prevent excessive updating of diagnostic 
message, currently it will be updated for every heartbeat for every accessed 
applications. I think we should limit frequency of updating to avoid overheads, 
hardcoding it to 1 sec seems fine to me for now, we can make it configurable if 
people starting complain it :)
- Generally, I think the message format could be:
{{Last update from scheduler: <time> (such as 23 sec before); <message> (such 
as "Application is activated, waiting for allocating AM container"); Details: 
(instead of GenericInfo) Partition=x, queue's absoluate capacity ... (and other 
fields in your patch)}}
- After AM container is allocated and running, above message is still useful 
because people could understand if application is actively allocating resource 
or stay in the queue waiting to be accessed.

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> --------------------------------------------------------------------------------
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.

This message was sent by Atlassian JIRA

Reply via email to