Naganarasimha G R commented on YARN-3946:

Hi [~wangda], 
Sorry for the delay. As per the offline discussion we had concluded to 
# We should only record AM launch related events with the patch, so we don't 
need to record recover/running state. (I think you can clean 
am-launch-diagnostic when AM container allocated).
# Event time is good, but I think we should put it in a separated JIRA. Maybe 
we need do some refactoring of existing diagnostic part.

I have taken care about the first point and have AM launch diagnostic messages 
till container is assigned to the AM process. and for the second point as it 
was simple modification, i have captured it in this jira itself. Please check 
it .
Also another difference from the previous patch, as i was earlier mentioning in 
some cases the reason why the node is not assigned was getting overwritten by 
the following modification in LeafQueue.
@@ -904,7 +919,9 @@ public synchronized CSAssignment assignContainers(Resource 
         // Done
         return assignment;
-      } else if (!assignment.getSkipped()) {
+      } else if (assignment.getSkipped()) {
+        application.updateNodeDiagnostics(node);
+      } else {
hence have handled in this patch by storing this diagnostic message temporarily 
and clear it once message is created
Also have pasted some images related to the patch.

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> --------------------------------------------------------------------------------
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.

This message was sent by Atlassian JIRA

Reply via email to