Wangda Tan commented on YARN-3946:

[~Naganarasimha], thanks for updating the patch. Some minor comments:
1) Not sure if this is needed:
      // reset AMLaunchDiagnostics once AM Registers with RM but in case of
      // Unmanaged AM we keep the diagnostic message till the attempt is
      // finished
      if (! appAttempt.submissionContext.getUnmanagedAM()) {
My feeling is we should set AMLaunchDiagnostics no matter AM is managed or not. 

2) RMAppImpl:
{{getCurrentAppAttempt().getDiagnostics()}} is called twice.

3) FiCaSchedulerApp:
Suggest to rename
bq. public void updateNodeInfoForAMDiagnostics(String message)
To "updateAppSkipNodeDiagnostics", IIUC, it will be called when app skips 

4) It's better to update the patch to avoid hard coded message (especially when 
you need to verify them in test). Is it make sense to create a 
AMContainerLaunchDiagnostics at yarn.scheduler (for general launch diagnostics 
if you have) and CSAMContainerLaunchDiagnostics at yarn.scheduler.capacity? 
LeafQueue.USER_S_AM_RESOURCE_LIMIT_EXCEED can be removed as well.

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> --------------------------------------------------------------------------------
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.

This message was sent by Atlassian JIRA

Reply via email to