[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109942#comment-15109942 ] Wangda Tan commented on YARN-4479: -- bq. Wangda Tan Would you mind if I take over new JIRA fixing all these issues? +1 to fix this in a separated JIRA > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109985#comment-15109985 ] Naganarasimha G R commented on YARN-4479: - Thanks for pointing out [~wangda], bq. Instead of using queue's configured ordering-policy for pending apps, it should use FifoOrderingPolicyForPendingApps. Yes and having {{FairOrderingPolicy}} for the pending apps does not make sense as the {{getCachedUsed}} will be always zero . And as you said currently we can keep it fixed policy for now as there no real world need to make pending queues configurable. So i am ok with this approach . > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110003#comment-15110003 ] Rohith Sharma K S commented on YARN-4479: - Filed a ticket YARN-4617 for handling above mentioned points. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109057#comment-15109057 ] Jian He commented on YARN-4479: --- [~rohithsharma], I think we had same discussion offline whether it's worth for pendingApps to have its own ordering policy. But it was then crossed off. What was the reason ? Is it because we thought pending apps and active apps should share most policy ? Now I agree this approach is better, if we agree that pending apps can just be treated separately, > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109636#comment-15109636 ] Rohith Sharma K S commented on YARN-4479: - bq. But it was then crossed off. What was the reason ? Yes, we discussed it offline and crossed off. Reason is # After YARN-3873, it was assumed that both active-&-pending should *always* share same policy. And in earlier [comment|https://issues.apache.org/jira/browse/YARN-4479?focusedCommentId=15108236=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15108236] in this jira [~leftnoteasy] pointed out it should not be same. # Another reason was, was thinking that to introduce a new configuration for pending ordering policy. Wangda suggests that we can have fixed ordering policy nevertheless of any active ordering policy. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108236#comment-15108236 ] Wangda Tan commented on YARN-4479: -- Thanks [~sunilg]/[~Naganarasimha], I looked at existing code again, I still think we should make such logic existed in orderingPolicy only. Such logic increases complexities of LeafQueue, and it has to handle the new added {{pendingOPForRecoveredApps}} In many places, such as: {code} 765 if (application.isAttemptRecovering()) { 766 pendingOPForRecoveredApps.removeSchedulableEntity(application); 767 } else { 768 pendingOrderingPolicy.removeSchedulableEntity(application); 769 } {code} And {code} 1566for (FiCaSchedulerApp pendingApp : pendingOPForRecoveredApps 1567.getSchedulableEntities()) { 1568 apps.add(pendingApp.getApplicationAttemptId()); 1569} 1538for (FiCaSchedulerApp pendingApp : pendingOrderingPolicy1570 for (FiCaSchedulerApp pendingApp : pendingOrderingPolicy {code} In addition to this problem, I just noticed another issue of pending-ordering-policy introduced by YARN-3873. It assumes queue's ordering-policy for pending apps should as same as ordering-policy for active apps. But actually it doesn't, for example, pending-ordering-policy of fair-ordering-policy should be FIFO instead of FAIR. Some ideas on top of my mind to fix the above issue and cleanup code. - Keep changes for SchedulerApplicationAttempt/CapacityScheduler in the patch: 0006-YARN-4479.patch to set "isRecoverying" field - Add a RecoveryComparator, and add a new FifoOrderingPolicyForPendingApps which extends FifoOrderingPolicy but uses RecoveryComparator - Instead of using queue's configured ordering-policy for pending apps, it should use FifoOrderingPolicyForPendingApps. with the change, user cannot configure ordering-policy for pending-apps, I didn't see a strong real life requirement for that now. Thoughts? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108300#comment-15108300 ] Sunil G commented on YARN-4479: --- Thanks [~leftnoteasy] for the detailed comment. Yes, I understood the reason behind the same when FairOrderingPolicy is used. Generally fine with approach suggested. However few minor points. 1. bq.Keep changes for SchedulerApplicationAttempt/CapacityScheduler in the patch: 0006-YARN-4479.patch to set "isRecoverying" field We already have a recovering field in SchedulerApplicationAttempt. But if we make use of this, we need to reset the flag once application is moved to {{activeApplications}} list. So if we reset this flag, we will loose information whether this was a recovered app. As of now I am not seeing any use with this, but it may confuse. So cud we introduce a new flag/field for this to avoid confusion. 2. With new FifoOrderingPolicyForPendingApps, we are agreeing that Priority and Submission Time will be factor to decide pending apps ordering, correct? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108340#comment-15108340 ] Rohith Sharma K S commented on YARN-4479: - Right, I think we can spin it as improvement since there were few JIRA went on top of this. [~leftnoteasy] Would you mind if I take over new JIRA fixing all these issues? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108323#comment-15108323 ] Wangda Tan commented on YARN-4479: -- Hi [~sunilg]/[~rohithsharma], Please note one of my comment above: {code} Instead of using queue's configured ordering-policy for pending apps, it should use FifoOrderingPolicyForPendingApps. with the change, user cannot configure ordering-policy for pending-apps, I didn't see a strong real life requirement for that now. {code} I think we still need to treat ordering policy of pending apps specially, and with the new added FifoOrderingPolicyForPendingApps, we don't need to reset isRecoverying and other ordering policies don't need the isRecoverying field, so it won't affect performance. bq. 2. With new FifoOrderingPolicyForPendingApps, we are agreeing that Priority and Submission Time will be factor to decide pending apps ordering, correct? Yes, the only difference between FifoOrderingPolicyForPendingApps and FifoOrderingPolicy is FifoOrderingPolicyForPendingApps let recovering apps go first. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108310#comment-15108310 ] Rohith Sharma K S commented on YARN-4479: - Hi [~leftnoteasy] There were 2 approaches to solve this # Handle this problem in upper layer i.e LeafQueue layer nevertheless of any ordering policy.[0006-YARN-4479.patch|https://issues.apache.org/jira/secure/attachment/12780671/0006-YARN-4479.patch] # Handle in very bottom layer i.e specific to ordering policy. If any new ordering policy added then that has to take care of this JIRA problem. [0003-YARN-4479.patch|https://issues.apache.org/jira/secure/attachment/12779965/0003-YARN-4479.patch] Issues in approach-2 is ## Performance : It does comparison for wasAttemptRecovering on every addition or removal. When there are huge number of applications, it is significantly impact. ## The flag need to reset while adding to activeApplication list. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108326#comment-15108326 ] Rohith Sharma K S commented on YARN-4479: - bq. I think we still need to treat ordering policy of pending apps specially, and with the new added FifoOrderingPolicyForPendingApps, we don't need to reset isRecoverying and other ordering policies don't need the isRecoverying field, so it won't affect performance. +1 for the approach, makesense to me > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108330#comment-15108330 ] Sunil G commented on YARN-4479: --- Yes. Makes sense to me. +1 for the approach. I think we can spin off this in a new ticket, because few patches went on top of this change. So reverting may be complex. Thoughts? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108033#comment-15108033 ] Naganarasimha G R commented on YARN-4479: - I think you are referring to the approach similar to the one done in 0002-YARN-4479.patch ? having additional logic in the comparator which checks whether the attempt was wasAttemptRunningEarlier. After discussion we tried to avoid it as unnecessary comparisions happen s even after recovery when comparing each app. If you have any other approach may be we can discuss further > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107994#comment-15107994 ] Wangda Tan commented on YARN-4479: -- Hi [~rohithsharma], Apologize for my very late feedback. Instead of adding a new list of recovery-and-pending-apps, could we add this behavior (early submitted & running apps goes first) to our existing policy? Maintaining only one ordering policy in LeafQueue is easier. Thoughts? [~jianhe]/[~Naganarasimha]/[~sunilg] > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108048#comment-15108048 ] Sunil G commented on YARN-4479: --- Yes [~leftnoteasy], as mentioned by [~Naganarasimha Garla] , this option came up as a possible solution. However, there were few complexities: For this approach, we Needed a new {{RecoveryComparator}}. This has to be added to {{FifoOrderingPolicy}} also. RecoveryComparator was supposed to run with the information that whether this app was running prior to recovery. So a flag has to be added to FicaSchedulerApp, and then reset the same after first round of activation. Hence more complexities in various part of scheduler was needed for this approach. So a simpler approach is made in LeafQueue. Pls share your thoughts if we missed any in this approach. [~rohithsharma] Could u pls add if I missed any point for this approach. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090233#comment-15090233 ] Hudson commented on YARN-4479: -- FAILURE: Integrated in Hadoop-trunk-Commit #9075 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9075/]) YARN-4479. Change CS LeafQueue pendingOrderingPolicy to hornor recovered (jianhe: rev 109e528ef5d8df07443373751266b4417acc981a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085274#comment-15085274 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 39s {color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 40s {color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 29s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 25s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 209m 2s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem ||
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085297#comment-15085297 ] Rohith Sharma K S commented on YARN-4479: - test cases are unrelated to this patch. These tests failures will be handled in YARN-4478 > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083283#comment-15083283 ] Sunil G commented on YARN-4479: --- Patch looks good [~rohithsharma]. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083248#comment-15083248 ] Rohith Sharma K S commented on YARN-4479: - updated 0005-YARN-4479 patch , kindly review it > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083611#comment-15083611 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 18s {color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 21s {color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 14s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 27s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 51s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 219m 58s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084608#comment-15084608 ] Jian He commented on YARN-4479: --- +1, [~rohithsharma], could you see if the warnings are related > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081156#comment-15081156 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 32s {color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 22s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 357, now 353). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 19s {color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 18s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 54s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 56s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 215m 20s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | |
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075860#comment-15075860 ] Rohith Sharma K S commented on YARN-4479: - Discussed offline with [~jianhe] for sync up and for the solution. The summary is follows and respective patch updated. # For failed attempt, while recovering need not to add to scheduler. Necessary changes is done at RMAppAttemptImpl # If attempts are added to scheduler means attempts were running before RM restart. # Any recovering attempts are added to new ordering policy {{pendingOrderingPolicyRecovery}} which is given higher preference than {{pendingOrderingPolicy}} while activating the applications. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075862#comment-15075862 ] Rohith Sharma K S commented on YARN-4479: - kindly review the updated patch. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075912#comment-15075912 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 357, now 356). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 2 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 43s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 49s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.pendingOrderingPolicy; locked 87% of time Unsynchronized access at LeafQueue.java:87% of time Unsynchronized access at LeafQueue.java:[line 1517] | | | Inconsistent
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075149#comment-15075149 ] Sunil G commented on YARN-4479: --- Sorry I didnt meant FairScheduler, I was trying to mention {{FairOrderingPolicy}}. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075138#comment-15075138 ] Sunil G commented on YARN-4479: --- HI [~rohithsharma] This new fix will also introduce RecoveryComparator to FairOrderingPolicy too. Is it needed? I think it can be tracked separate after checkin whether same pblm will arise with FairScheduler. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075452#comment-15075452 ] Jian He commented on YARN-4479: --- sorry, I missed this part, the recovered app need to be respected first only for the LeafQueue#pendingOrderingPolicy, right ? for LeafQueue#orderingPolicy, this is not needed. bq. Reference test case TestRMRestart#testRMRestartAppRunningAMFailed I don't understand how this test case is related. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074779#comment-15074779 ] Rohith Sharma K S commented on YARN-4479: - I think we can just relay on isAppRecovering flag which should be sufficient. And existing code in RMAppAttemptImpl can be there as-it-is(without patch). Only FAILED attempts are added to scheduler which will be removed in next event itself. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075069#comment-15075069 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 7 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 274, now 275). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 43s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.policy.PriorityComparator implements Comparator but not Serializable At PriorityComparator.java:Serializable At PriorityComparator.java:[lines 26-34] | | JDK v1.8.0_66 Failed junit tests |
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074472#comment-15074472 ] Jian He commented on YARN-4479: --- - For finished attempt, I think we do not need to re-add into scheduler, so this whole code could be removed. {code} if (EnumSet.of(RMAppAttemptState.RUNNING, RMAppAttemptState.LAUNCHED) .contains(appAttempt.recoveredFinalState)) { appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent( appAttempt.getAppAttemptId(), false, true, true)); } else { appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent( appAttempt.getAppAttemptId(), false, true)); } {code} Accordinlgy in BaseFinalTransition, this code need to be invoked if recoveredFinalState == null {code} appAttempt.eventHandler.handle(new AppAttemptRemovedSchedulerEvent( appAttemptId, finalAttemptState, keepContainersAcrossAppAttempts)); {code} - With above change, we can assume that attempt added into scheduler should be running, so the extra field wasAttemptRunning in AppAttemptAddedSchedulerEvent is not needed, the existing isAttemptRecovering flag should be enough. - I think [~Naganarasimha]'s suggestion make sense. we should consider FairComparator too. May be we can add a predefined comparator in AbstractComparatorOrderingPolicy with the recoveryComparator initialized and force underlying implementations to use this ? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074705#comment-15074705 ] Rohith Sharma K S commented on YARN-4479: - bq. For finished attempt, I think we do not need to re-add into scheduler, so this whole code could be removed. While recovering application and attempts, If the last attempt is FAILED then from scheduler transfer state from previous attempt. So, whenever there is failed attempt, attempt has to be added to scheduler for obtaining the state. Reference test case {{TestRMRestart#testRMRestartAppRunningAMFailed}} > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070521#comment-15070521 ] Naganarasimha G R commented on YARN-4479: - Hi [~rohithsharma], Thanks for the patch, New approach seems to be better than the older as it tries to avoid additional data structure used for the same purpose, but few points : * If we consider for FairOrderingPolicy it first considers {{FairComparator}} and then the {{FifoComparator}}, so only if fairness is equal it will consider whether the application was already running, so would it be better to add additional comparator for recovery which can be used by both Fair and Fifo ? * So it will be totally left to Ordering policy whether to consider the order of the recovered app based on submission time or not, so better to get that documented so that custom ordering policy can consider it. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070556#comment-15070556 ] Rohith Sharma K S commented on YARN-4479: - I had 2 options in doing this in fifoordering policy. I took simpler approach to make working patch. Further improvements like this will/can be addressed in coming patches once initial approach is agreed upon. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070242#comment-15070242 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 10 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 373, now 379). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 10s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 2, now 3). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 54s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 152m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests |
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066292#comment-15066292 ] Rohith Sharma K S commented on YARN-4479: - The patch does following things # During attempt recovery, adds new flag which tells scheduler that was it RUNNING/LAUNCHED before RM restart # For all the attempts during recovery, previous running attempts are added to separate pendingApplicationOrdering policy which is used to track applications which are running before RMRestart. # When a node registers, activate first which are running before RMRestart. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066317#comment-15066317 ] Naganarasimha G R commented on YARN-4479: - Hi [~rohithsharma], As discussed offline, Assume b4 recovery Apps which were activated : *A1*(Low),*A2*(Low),*A3*(Medium) and pending were *A4*(High) & *A5*(High) Based on the current approach applications will be activated in the order *A4,A5,A3,A1,A2* After your patch it will be *A3,A1,A2,A4,A5* So in a way its better than the existing approach but wanted to know whether the order of activation should be *A1, A2, A3* itself and not based on the ordering policy for the recovered apps, Thoughts ? IIUC [~sunilg] also wanted to say the same ? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066484#comment-15066484 ] Naganarasimha G R commented on YARN-4479: - Thanks for the comments [~sunilg] & [~rohithsharma], bq. This patch tries to activate all applications which were running before RM restart happened. IIUC the patch, it goes through the existing flow hence all applications will not be activated by default but only if queue's AM resource limit is available, app will get activated. bq. 2. All containers which were running earlier will still continue, To elaborate further Based on the scenario which i had mentioned, Assume queue capacity is 120GB (for simplicity), and AM resource limit is 10%(=12GB) and AM resource : A1 = 8GB , A2 = 2GB, A3 = 2GB, A4 = 2Gb, A5 =2Gb. After recovery assume all nodes are not up and only 100 Gb is available So as per the code in patch A3, A2, A4 & A5 will get activated (8GB) and A1 will not get activated though the app is running. Correct me if my understanding is wrong bq. Being said all this points, I also feel that we may need to add more complex code to keep the same order as you proposed. So if there are no major impacts, I think the approach taken in this patch looks fine. Thoughts? IIUC point 1 is same as with or without the patch so no issues, point 2 IIUC your assumption is wrong. ??All containers which were running earlier will still continue?? But the approach to the scenario which i mentioned is debate able, if it introduces too much complexity then we can skip but just wanted to share the scenario, as i said current approach is fine except for the scenario mentioned. few nits/query in the patch {code} @@ -607,9 +612,24 @@ private synchronized void activateApplications() { MapuserAmPartitionLimit = new HashMap (); -for (Iterator i = getPendingAppsOrderingPolicy() -.getAssignmentIterator(); i.hasNext();) { - FiCaSchedulerApp application = i.next(); +for (Iterator i = +getPendingAppsOrderingPolicyRecovery().getAssignmentIterator(); i +.hasNext();) { + activateApplications(i, amPartitionLimit, userAmPartitionLimit); +} + {code} Is for loop required here as we are looping the iterator in overloaded {{activateApplications(fsApp, amPartitionLimit, userAmPartitionLimit)}} > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066282#comment-15066282 ] Rohith Sharma K S commented on YARN-4479: - Scenario which cause issue is # Submitted the app-1 and app-2 with priority 5. Both applications are activated and RUNNING state. # Submit app-3 with priority 6. This application is in pending state because of AMLimit. # RM restarted, app-1 application is activated(it is behavior that for 1st application AMLimit is not considered) and app-2 and app-3 are in pendingOrderingPolicy # AM re-registered for app-1 and app-2. Its state is now RUNNING. But app-2 and app-3 are still in pendingapplications. # NodeManager re-registered with RM. As a result 1 application supposed to be get activated. Here, always app-3 get activated since app-3 priority is higher, but app-2 should get activated first since it is running before RMrestart. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066286#comment-15066286 ] Rohith Sharma K S commented on YARN-4479: - Attaching the patch , kindly review.. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066294#comment-15066294 ] Sunil G commented on YARN-4479: --- Adding to this, Application which is in RUNNING state prior to restart will still be in RUNNING state. However, it will *NOT* be getting any new containers until its been made active again when other high priority applications are completed. Patch generally looks fine, I will give some comments after checking patch more detail. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066368#comment-15066368 ] Rohith Sharma K S commented on YARN-4479: - bq. but wanted to know whether the order of activation should be A1, A2, A3 itself and not based on the ordering policy for the recovered apps, Thoughts ? It should be based on the ordering policy only. At this stage, all 3 applications A1,A2 and A3 are in same level of position. So activation should be based on the ordering policy implementation. In specific to priority, always highest priority application should be activated first. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066378#comment-15066378 ] Naganarasimha G R commented on YARN-4479: - In Most cases this would be sufficient, but consider a case where in A3 is an app with large number of containers and A1 and A2 are short jobs. May be after recovery all nodes have not registered, due to AM resource limits A1 and/or A2 AM's resource if activated will exceed AM's resource limits, so only A3 will be running. Problem here is : there is no configurable ordering policy for recovered apps seperately, it applies the same policy so in rare cases it might lead to starvation ? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066408#comment-15066408 ] Rohith Sharma K S commented on YARN-4479: - Without RM restart also now apps can be starved if AMLimit is reached. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066402#comment-15066402 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 10 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 321, now 325). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 29s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 2 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 150m 57s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066415#comment-15066415 ] Sunil G commented on YARN-4479: --- Hi [~Naganarasimha] bq.IUC Sunil G also wanted to say the same ? I meant in slight different way. With existing approach, any application which was running earlier to RM restart will also be in RUNNING state. It may starve as per the scenario which you and Rohith mentioned, but the state of application will be running. Also adding to the existing discussion, I would like to point out few things. This patch tries to activate all applications which were running before RM restart happened. It may be getting activated with different order, but it is trying to put all these apps in the activated list of scheduler (App state will be RUNNING still). 1. After restart with or without ordering, only highest priority app will be selected for scheduling from activated list. This same behavior is happening before RM restart also. SO there seems no impact with this. Pls correct me if I am wrong. 2. All containers which were running earlier will still continue, and all pending requests will be updated/refreshed and this is from ApplicationMasterService thread. So if all earlier running apps are activated,then behavior will be same from scheduler end, correct? Being said all this points, I also feel that we may need to add more complex code to keep the same order as you proposed. So if there are no major impacts, I think the approach taken in this patch looks fine. Thoughts? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066571#comment-15066571 ] Naganarasimha G R commented on YARN-4479: - Thanks [~sunilg], bq. Its debatable and I think with discussion we can conclude the approach here. True its debate-able, but one more thing to be considered(/not missed) here A4 and A5 gets activated even before A2 (as per the correction i mentioned). bq. {{All containers which were running earlier will still continue}} I mistook what you meant, it seems like you got what i wanted to mention. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066519#comment-15066519 ] Naganarasimha G R commented on YARN-4479: - small correction in the example : {{A1 = 8GB , A2 = 2GB, A3 = 2GB, A4 = 2Gb, A5 =2Gb}} => {{A1 = 2GB , A2 = 8GB, A3 = 2GB, A4 = 2Gb, A5 =2Gb}} > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066553#comment-15066553 ] Sunil G commented on YARN-4479: --- Thanks [~Naganarasimha Garla] fr the comments. bq.This patch tries to activate all applications which were running before RM restart happened Being said this, yes its definitely depends on available AM limit after restart (I meant the positive case in my earlier comment where all cluster resource were available). I did think about the case when some NMs are not registered back, and limit is lesser. In that case, we will have app-A1 pending in the list to get activated. And this application will be the one which will be activated first if any space is available. This ensures that high priority apps which were in the pending list will get containers, and app-A1 which were low in priority will wait. Even though A1 is activated, it has to wait till other high priority apps are done with its request. So A1 in pending list is may be fine provided other apps are completed sooner or failed NMs are up. But I am not saying its correct. Its debatable and I think with discussion we can conclude the approach here. Also abt {{All containers which were running earlier will still continue}}, I meant about the live containers of apps which were running prior to restart. After restart, even for the pending apps (apps like A1) as mentioned in ur scenario, its running containers wont be killed. Am I missing something? > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063619#comment-15063619 ] Sunil G commented on YARN-4479: --- Thanks [~rohithsharma] for raising this. As discussed offline, all running attempts during recovery (attempts state *NOT* in final states and *NOT* null) could be considered for this list. Such apps can be made activated first just to be in sync for the state before recovery. Now also its possible that we may get starvation but it will be same as what it was happening before HA. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063610#comment-15063610 ] Rohith Sharma K S commented on YARN-4479: - Thinking while recovering the applications, previous running applications should be added to separate ordering policy like {{pendingOrderingPolicyRecovery}} and use this while activating applications prior to pendingOrderingPolicy iteration. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)