[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066484#comment-15066484
 ] 

Naganarasimha G R commented on YARN-4479:
-----------------------------------------

Thanks for the comments [~sunilg] & [~rohithsharma],
bq. This patch tries to activate all applications which were running before RM 
restart happened. 
IIUC the patch, it goes through the existing flow hence all applications will 
not be activated by default but only if queue's AM resource limit is available, 
app will get activated.

bq. 2. All containers which were running earlier will still continue, 
To elaborate further Based on the scenario which i had mentioned, Assume queue 
capacity is 120GB (for simplicity), and AM resource limit is 10%(=12GB) and AM 
resource : A1 = 8GB , A2 = 2GB, A3 = 2GB, A4 = 2Gb, A5 =2Gb. After recovery 
assume all nodes are not up and only 100 Gb is available So as per the code in 
patch A3, A2, A4 & A5 will get activated (8GB)  and A1 will not get activated 
though the app is running. Correct me if my understanding is wrong

bq. Being said all this points, I also feel that we may need to add more 
complex code to keep the same order as you proposed. So if there are no major 
impacts, I think the approach taken in this patch looks fine. Thoughts?
IIUC point 1 is same as with or without the patch so no issues, point 2 IIUC 
your assumption is wrong. ??All containers which were running earlier will 
still continue??
But the approach to the scenario which i mentioned is debate able, if it 
introduces too much complexity then we can skip but just wanted to share the 
scenario, as i said current approach is fine except for the scenario mentioned.

few nits/query in the patch
{code}
@@ -607,9 +612,24 @@ private synchronized void activateApplications() {
     Map<String, Resource> userAmPartitionLimit =
         new HashMap<String, Resource>();
 
-    for (Iterator<FiCaSchedulerApp> i = getPendingAppsOrderingPolicy()
-        .getAssignmentIterator(); i.hasNext();) {
-      FiCaSchedulerApp application = i.next();
+    for (Iterator<FiCaSchedulerApp> i =
+        getPendingAppsOrderingPolicyRecovery().getAssignmentIterator(); i
+        .hasNext();) {
+      activateApplications(i, amPartitionLimit, userAmPartitionLimit);
+    }
+
{code}

Is for loop required here as we are looping the iterator in overloaded 
{{activateApplications(fsApp, amPartitionLimit, userAmPartitionLimit)}}

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> -------------------------------------------------------------------------------
>
>                 Key: YARN-4479
>                 URL: https://issues.apache.org/jira/browse/YARN-4479
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to