[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614581#comment-14614581 ] Sunil G commented on YARN-2004: --- Ah, Sorry! Thank you [~devaraj.k] for correcting. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 0009-YARN-2004.patch, 0010-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612159#comment-14612159 ] Sunil G commented on YARN-2004: --- Thank you [~jianhe] for the comments. - bq.Or this method has more responsibility than that ? Yes. We are planning to check for acl's (priority acls) in this method. I was planning to handle that in separate ticket. {noformat} yarn.scheduler.capacity.root.queue_name.priority.acl=user1,user2 {noformat} This config will be in queue level, and we could restrict certain users to use some high priority. So only a certain users can use high priority, and other wont be able to submit application in that priority. This acl check was planning to add into {{authenticateApplicationPriority}}. - bq.we may merge the two into a single patch ? I will merge these patches together and will upload into YARN-2003. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 0009-YARN-2004.patch, 0010-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611034#comment-14611034 ] Jian He commented on YARN-2004: --- - authenticateApplicationPriority : IIUC, all it does is just to take the config from yarn-site.xml (not capacity-scheduler.xml) and check the priority against that. I don't see much need of explicitly exposing an API in scheduler and inject the check there. Or this method has more responsibility than that ? - Given that YARN-2003 is just the API of YARN-2004 and we anyways have to review the two altogether, we may merge the two into a single patch ? This is easier for review and you also do not need to split the patch and upload in two different places. And you can actually split the part about updating application priority at runtime and state store changes into a different patch. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 0009-YARN-2004.patch, 0010-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608857#comment-14608857 ] Wangda Tan commented on YARN-2004: -- Thanks for update, [~sunilg], comments to latest patch: 1) bq. I feel we can do the priority comparision first. Do you see any specific usecase for priority as factor Fair scheduler currently uses it as factor, see {{FairScheduler#getAppWeight}} 2) SchedulerApplicationAttempt/SchedulerApplication.appPriority should be volatile. I found there're some other fields need to be changed, not caused by your patch. For example: SchedulerApplication.currentAttempt, etc. I suggest we make appPriority correct in this patch, and address others in separated ticket. 3) Not caused by your patch, applicationComparator should be removed, and pendingApps in LeafQueue should use FifoOrderingPolicy to compare, we can do this in separated patch. 4) dfltAppPriorityPerQueue should be default .. 5) Is this check nececessary in SchedulerApplication.setPriority: {code} 78 if (null != currentAttempt) { 79currentAttempt.setApplicationPriority(priority); 80 } 81} {code} Should we simply prohibit changing application priority when app's in submitting stage? 6) Tests: - Add a test to verify updateApplicationPriority works? - Add end-to-end test to verify application priority works? (Not only check q.getApplications().iterator..) Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 0009-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606734#comment-14606734 ] Wangda Tan commented on YARN-2004: -- Thanks [~sunilg]'s update, some comments from my side: 1) getMaxClusterLevelAppPriority should return Priority. 2) updateApplicationPriority, I think updateApplicationPriority needs to send a message to RMApp so RMApp can write it to state store, once RM fails and recovers app, we should get priority after updating. And I suggest to create a method to SchedulerApplication, it will set priority to itself and SchedulerApplicationAttempt. And could you make set priority doesn't acquire application synchronized lock? 3) authenticateApplicationPriority: typically, LOG.debug needs wrapped by LOG.isDebugEnabled... 4) dflt should be better default, dflt is not a very common abbreviation in code to me. :) 5) change of compareInputOrderTo is not correct to me. {{compareInputOrderTo}} is to compare which application submission first. I think you need to modify {{FifoComparator}}, and compare priority based on SchedulerApplicationAttempt's priority. Changes of FairComparator is needed, but I think we can postpone the change, since FairComparator + Fifo may be more complicated : Should we do priority comparison first (treat priority as class) OR combination of them (treat priority as factor). [~jianhe]: bq. We may just move the check into RMAppManager... This may not work, since priority mapping happens in scheduler side. (set app's priority according to queue's default priority). bq. updateApplicationPriority - I think we don’t need to add an unused API now. I think update app priority is an important use case, according to [~jlowe] comment: https://issues.apache.org/jira/browse/YARN-1963?focusedCommentId=14328071page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14328071. I suggest to keep update application priority here. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606697#comment-14606697 ] Jian He commented on YARN-2004: --- thanks Sunil ! some comments on the patch: - The app priority seems only used for pending applications, how about priority support for the actively running applications ? - “default_application_priority”: the convention is to use “-” instead of “_”; similarly, the max-application-priority. - this should not happen {code} if (null == queue) { throw new YarnException( During application init/update, failure occured due to an unknown + queue name ' + queueName + ' from priority authentication); } {code} because queue will never be null, see below code in ClientRMService {code} if (submissionContext.getQueue() == null) { submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME); } {code} - max-application-priority is defined in yarn-site.xml, but here it’s retrieved from capacity-scheduler.xml. We may just move the check into RMAppManager. {code} if (priority.getPriority() getMaxClusterLevelAppPriority()) { throw new YarnException(Invalid priority as Queue: + queueName + cannot support more than priority ' + getMaxClusterLevelAppPriority() + '); } {code} - updateApplicationPriority - I think we don’t need to add an unused API now. we can do this later when implementing the functionality of updating app priority Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605153#comment-14605153 ] Sunil G commented on YARN-2004: --- Ah. About SchedulerAppkicationAttempt, we still need null check for other schedulers. I ll update the patch with it. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603603#comment-14603603 ] Eric Payne commented on YARN-2004: -- Thanks, [~sunilg], for this fix. - {{SchedulerApplicationAttempt.java}}: {code} if (!getApplicationPriority().equals( ((SchedulerApplicationAttempt) other).getApplicationPriority())) { return getApplicationPriority().compareTo( ((SchedulerApplicationAttempt) other).getApplicationPriority()); } {code} -- Can {{getApplicationPriority}} return null? I see that {{SchedulerApplicationAttempt}} initializes {{appPriority}} to null. - {{CapacityScheduler.java}}: {code} if (!a1.getApplicationPriority().equals(a2.getApplicationPriority())) { return a1.getApplicationPriority().compareTo( a2.getApplicationPriority()); } {code} -- Same question about {{getApplicationPriority}} returning null. -- Also, can {{updateApplicationPriority}} call {{authenticateApplicationPriority}}? Seems like duplicate code to me. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603516#comment-14603516 ] Wangda Tan commented on YARN-2004: -- Thanks for updating, [~sunilg]. A quick comment before posting others, I think most of the code to check/update application priority can be reused by other schedulers. [~kasha], could you take a quick look at this patch to see if it is also needed for Fair Scheduler? Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517252#comment-14517252 ] Eric Payne commented on YARN-2004: -- [~sunilg], Thanks for all of the work you are doing for this important feature. {quote} queueA: default=low queueB: default=medium The type of apps which we run may vary from queueA to B. So by keeping default priority different for each queue will help to handle such case. Assume more high level apps are running in queueA often, and medium level in queueB. Making different default priority can help here. {quote} I don't know a lot about the fair scheduler, but I'm pretty sure that in the capacity scheduler, there is no way to make one queue a higher priority than another. There is no way to compare job priorities between queues. That is, you can't say that jobs running in queueA have a higher priority than jobs running in queueB. So, it only makes sense to compare priorities between jobs in the same queue. Am I missing something? Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517310#comment-14517310 ] Sunil G commented on YARN-2004: --- Yes [~jlowe] You are correct. We cannot compare highest priority across queues. If we do not do that, then there is not much meaning of keeping MAX priority per queue level. Initially I plan to change that part in another jira where we can have the max priority application running in queue also to take into consideration while processing node heartbeat [tries to select which queue can be considered based on resource consumption]. But this make things more complicated now in CS. I will be keeping this max in cluster level for now, so it can be accessible across all queues to make it simple. [~jlowe] [~leftnoteasy] [~vinodkv], pls share your thoughts. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517317#comment-14517317 ] Sunil G commented on YARN-2004: --- Extremely sorry [~eepayne] I mistyped your name as Jason. Hope you understood my comment about priority config across queue. Pls let me know your thoughts. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517545#comment-14517545 ] Eric Payne commented on YARN-2004: -- [~sunilg], bq. Hope you understood my comment about priority config across queue. Pls let me know your thoughts. I think you are referring to [~leftnoteasy]'s suggestion that a cluster-wide config should be added to put a cap on the maximum priorities allowed in the queue. Is that correct? I think that makes sense so that cluster admins can put a cap on the number of priorities within any given queue. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503848#comment-14503848 ] Wangda Tan commented on YARN-2004: -- [~jlowe], [~sunilg]: 1) Regarding to per-queue priority limit: I agree with per-queue priority limit could be added separately, but I think we may need a global priority limit to easier compare priority: It's easy to compare 101 and 190, but it maybe hard to compare 2123231223 and 2123123512. And showing a big-number priority on web UI is not good to me. So limit maximum priority is to have a better user experience. 2) Regarding to negative priority: I prefer priority started from either 0/1. 3) Behavior when app.priority max-priority-limit: Should we just cap it by max-priority-limit instead of throw exception? Different from required-resource, priority is a hint to scheduler. Make a LOG.warn instead of reject it seems more friendly to me. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504219#comment-14504219 ] Sunil G commented on YARN-2004: --- Thank you [~jlowe] {noformat} @@ -327,6 +328,29 @@ private RMAppImpl createAndPopulateNewRMApp( ApplicationId applicationId = submissionContext.getApplicationId(); ResourceRequest amReq = validateAndCreateResourceRequest(submissionContext, isRecovery); + +Priority appPriority = submissionContext.getPriority(); +if (null != appPriority) { + try { +rmContext.getScheduler().authenticateApplicationPriority( +submissionContext.getPriority(), user, +submissionContext.getQueue(), applicationId); + } catch (IOException e) { +throw RPCUtil.getRemoteException(e.getMessage()); + } +} else { + // Get the default priority from Queue and set to Application + try { +appPriority = rmContext.getScheduler() + .getDefaultApplicationPriorityFromQueue(submissionContext.getQueue()); + } catch (IOException e) { {noformat} Above code snippet is from YARN-2003 which is handing changes in RM and Events for priority. When an app is submitted w/o priority, we would like to fill in with default priority from queue. bq.why would we want to limit which priorities are running within a queue? queueA: default=low queueB: default=medium The type of apps which we run may vary from queueA to B. So by keeping default priority different for each queue will help to handle such case. Assume more high level apps are running in queueA often, and medium level in queueB. Making different default priority can help here. [~leftnoteasy] Do you mean a global max priority which can help to limit the number associated with a priority ? bq. we just cap it by max-priority-limit instead of throw exception? Yes. I will update this part as against throwing exception. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503158#comment-14503158 ] Sunil G commented on YARN-2004: --- Thank you very much for the comments. bq. default of default-priority is -1 I also have similar opinion as told by [~jlowe]. If we are looking for linux like priority and with range (-N,N), we may need the support of negative. But as a simple comparison, both do not matter much. For maintainability, I also support use of +ve integer and 0 as default. bq. We don't need per-user settings to get the basic A user can submit an application with a given priority. This priority will be validated against 1) whether is a valid priority as per the cluster priority list (0:Low, 1:Medium, 2:High) 2) whether is valid for the given queue config (QueueA {default=Low, max=Medium}) Hence Low and Medium are accessible for QueueA 3) ACLs (This will be done with a separate ticket) Now if user didnt submit app with a priority, we can take the default priority (Here for QueueA it is Low) configured for given queue. In earlier patch, this point was not added. I will add the same in subsequent patch. Coming to the point of discussion, I feel we can do this above design first, and then can handle per-user priority feature as a separate ticket. [~leftnoteasy] and [~jlowe] pls suggest your thoughts bq. There appear to be some missing NULL checks I am sorry for this, it will be removed. As suggested, I will change the log part and will upload a new version of patch. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503043#comment-14503043 ] Jason Lowe commented on YARN-2004: -- I don't think it matters if we allow negative priorities. Numbers are easy to compare, even when negative numbers enter the mix. If people feel strongly that negative numbers are too confusing to compare then we can force them to be non-negative or even non-zero if that is also too confusing. If we allow negative priorities then I suggest we use zero as the default priority. Any positive priority will be higher and scheduled before the default priority. Any negative priority will be lower and scheduled after the default priority. Simple. If we don't allow negative priorities then be sure to set the default such that users can set applications to be lower priority than the default. As for per-user priority settings, again I'm advocating for simplicity first. We don't need per-user settings to get the basic, and highly-requested feature working first. Adding this feature later should not disrupt any of the initial APIs, as these would be separate, admin APIs (or just configs) that would not affect app submission. If per-user priority defaults are needed after the basic priority functionality is there then we can add it then. I'm not sure about the current state of the patch and how it relates to YARN-2003. I see there are default priorities, but I don't see them being really used in either this patch nor in the latest YARN-2003 patch. I'm also wondering if we really need per-queue priority limits. Currently application priorities have no effect _between_ queues, therefore I don't understand why we would want/need to limit application priorities in one queue vs. another queue. Maybe I'm missing the use-case for this feature. If we don't have a solid use-case for it then we should not add it until we need it. Again, this is something we can always add later. There appear to be some missing NULL checks when it comes to priorities in the following code: {code} + if (a1.getApplicationPriority() != null + !a1.getApplicationPriority().equals(a2.getApplicationPriority())) { +return a1.getApplicationPriority().compareTo( +a2.getApplicationPriority()); + } {code} If a1.getApplicationPriority() returns non-null but a2.getApplicationPriority() returns null then I think we will NPE, as Priority.compareTo has no null checks. Nit: I'd like to see the submitted application priority logged along with other essential app details when the app is submitted rather than a separate log message just for priority. The RM log is already too wordy, and this INFO message will add to it. Maybe it should just be a debug log? {code} +LOG.info(Submitted priority ' + priority.getPriority() ++ ' is acceptable in queue : + queueName + for application: ++ applicationId); {code} Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503526#comment-14503526 ] Jason Lowe commented on YARN-2004: -- I'm still missing why it makes sense for queues to have different access to priorities. Currently priorities only have an effect within a queue, not between queues, so why would we want to limit which priorities are running within a queue? I'm still missing the use-case for this, and as such it looks like additional complexity without any benefit. bq. I have considered default priority scenario where if submitted app does not gave any priority, then default will be taken. So chances of null here in above scenario wont happen. Where is this occurring? I see a lot of getDefault*Priority functions but not where they're actually used to set the app's priority if no priority is specified during app submission. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499028#comment-14499028 ] Wangda Tan commented on YARN-2004: -- Some comments: 1) I noticed default of default-priority is -1, do you think we should limit priority = 0? With existing interface in queue, we don't limit the lowest priority, so maybe we should limit it ourselves. 2) Beyond priority settings on queue, do you think we should have per-user priority setting? If we don't limit user's priority, we will end up with all users asking for max-priority in the queue. And also user's default could be different, CEO's default may be max-priority. But this needs input of real world use cases. ([~jlowe], thoughts?) 3) null check in app priority comparator still exist, did you mention to remove it? bq. i can remove NULL check. Will only have a direct compareTo check for priority. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327117#comment-14327117 ] Sunil G commented on YARN-2004: --- Thank you [~jlowe] and [~leftnoteasy] for the input. Yes, there are alternate ways we can achieve scenario 1. Also for scenario 2, YARN-2009 will help. Hence this JIRA can now currently focus on the basic priority addition to Schedulers. bq.Priority is only considered if both applications have a priority that was set. If a set of priorities is loaded to RM and one is chosen as Default priority for a queue, it can be any priority from lowest to highest. So All the applications running w/o priority will be given as this default priority. Hence some lower priority application will end up with lower preference than an application running w/o priority. But this is also a perception from user. If user can consider that all applications running w/o priority will fall to default chosen one per queue , then the behavior will be as expected. On that note, I also feel that i can consider all applications running w/o priority will be of Default priority. [~jlowe] Pls share your thoughts w.r.t the above scenario. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327650#comment-14327650 ] Sunil G commented on YARN-2004: --- Yes [~jlowe]. Agreeing to your point. As of now, I have given a configuration to specify default priority in a queue. That can be applied for those applications which are submitted w/o priority. A cluster wide config also will be added, and given a queue level config, it can override customer wide default value. I will update patch as per this understanding. Thank you. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327527#comment-14327527 ] Jason Lowe commented on YARN-2004: -- My thoughts are as I stated above. We should not ignore priorities if one of the apps does not have a priority specified. A lack of a specified priority on an application should imply a default priority value and still be compared to the other application's priority rather than skipping the priority comparison. That would be the expected behavior. We can come up with all sorts of schemes to determine what the default priority value should be (e.g.: hardcoded default value, cluster-wide configurable, queue-specific configurable, etc.). The important part is to not skip the priority comparison completely as that would be unexpected behavior for users. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327730#comment-14327730 ] Sunil G commented on YARN-2004: --- As per YARN-2003, RMAppManager#submitApplication process input from submissionContext. I will add a case here which will handle the scenario where priority is NULL from submission context. It can be updated with default priority from queue. As for this patch, i can remove NULL check. Will only have a direct compareTo check for priority. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325576#comment-14325576 ] Sunil G commented on YARN-2004: --- About the priority inversion problem, I feel we could use below approach 1. To identify lower priority application which is waiting for resource over a long period, *lastScheduledContainer* in *SchedulerApplicationAttempt* can be used to get the timestamp of last allocation. And based on a time limit configuration, it is possible to identify the apps which are starving. 2. Identify few higher applications and decrease its headroom explicitly by one resource request of lower priority application. 3. Reset the headroom of higher priority application back once lower priority application has got the container. Kindly share the thoughts on same. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326197#comment-14326197 ] Sunil G commented on YARN-2004: --- Hi Jason, thank you for sharing the thoughts. In one way, we need not have to think abt headroom and userlimit. Still I would like to share 2 scenarios 1. Similar to MAPREDUCE-314. A job j1 is submitted with lower priority and finished its map tasks, reducers are running. later j2 and j3 came in and took over cluster resources. if a map is failed, by loosing some map o/p, there are no chances of getting a resource for j1 till j2 and j3 releases resources and not allocating it. In a -ve scenario, j1 will starve for much longer. This was one of the intention to temporarily pause demand from j2 and j4 for a while and spare some resources for j1. 2. User Limit: Assume the factor is 25, and 4 users can take 25% each from cluster. 5th user has to wait. Assume the highest priority app is submitted by 5th user. He may not get resources untill demand from first 4 users(for existing apps) are over. Do you feel this is to be handled? Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326238#comment-14326238 ] Jason Lowe commented on YARN-2004: -- For your first scenario, it can happen today without priority. MR jobs ask for resources in waves -- first all the maps, then over time it ramps up reducers. Multiple jobs in the same queue from the same user can collide in different phases. That's the whole point of the headroom calculation and reporting -- to allow AMs to realize this scenario is happening and react to it. In this case what will happen is j1 will see its headroom is zero and start killing reducers to make room for the failed map task. After killing the reducers there will be some free resources in the cluster (if they weren't stolen by another, underserved queue). Then the question goes to who will get those resources. If we're using the default priority, j1 will get first crack at them due to FIFO priority. If j2 or j3 were made higher priority then j1 will see that its headroom is _still_ zero after killing some reducers and will probably kill some more to try to make room. Rinse, repeat until j1 is out of reducers to shoot or gets the resources it needs to run the failed map. For the second scenario, the 5th user will _still_ be the first one to get any spare resources in the queue because he has the highest priority app. Note that the user limit calculation does not involve comparing a user's current limit with other user's usage. It's just a computation of what's available in the queue and what you're allowed based on the configured user limit and user limit factor. So what will happen is the 5th user will continue to consume any free resources in the queue until either the app is satiated or the 5th user hits the 25% cap. If there are no free resources then the 5th user's app will starve (without preemption) just like the rest until resources show up. Again, higher priority just means you're first in line to get resources when they are freed up, and it doesn't change anything else. We can discuss adding preemption into the mix to force higher priority apps to get their requested resources faster in a full queue. However I think the first step is to get priority scheduling working for resources that are free in the queue in the non-preemption case, as that's still very useful in practice. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326582#comment-14326582 ] Wangda Tan commented on YARN-2004: -- [~sunilg], Thanks for uploading patch, I just read comments from [~jlowe], I think what he said all make sense to me. For scenario#1 There're some possible solutions to tackle the priority inversion problem you just mentioned. But it is more important to make CS with basic priority works first. What you said is more like adjustable priority, which could be updated according to application's waiting time or other factors. For scenario#2 It is possible that a user with higher priority application comes but there's no available resource in a queue, preemption policy should reclaim resource from other users. YARN-2009 should cover it. General approach of the patch looks good to me. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326145#comment-14326145 ] Jason Lowe commented on YARN-2004: -- I'm not sure I understand the priority inversion problem and why we would be changing headroom. The headroom has no priority calculations in it. As I see it, the priority scheduling is _only_ changing the order in which applications are examined when deciding how to assign free resources in a queue. In other words, it does _not_ change the following: - the priority order between queues (i.e.: deciding which queue is first in line to obtain free resources in the cluster) - the user limits within a queue (i.e.: making an app higher priority does not implicitly give the user more room to grow within the queue than normal) - the headroom for an app within the queue (higher priority doesn't change the queue capacity or user limits) For example, a user is running app A then follows up with app B. The user decides app B is pretty important and raises its priority. This doesn't change the user limits within the queue or the headroom of those apps, but it does change which app will be assigned a spare resource if it is available. If the queue is totally full then both apps will be told their headroom is zero. One (or both) of them will need to free up some resources to make progress. When resources becomes available, app B will have the first chance to claim them since it was made a higher priority than A. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326737#comment-14326737 ] Jason Lowe commented on YARN-2004: -- I took a closer look at the patch, and the following logic seems suspect: {code} + if (a1.getApplicationPriority() != null + a2.getApplicationPriority() != null + !a1.getApplicationPriority().equals(a2.getApplicationPriority())) { +return a2.getApplicationPriority().compareTo( +a1.getApplicationPriority()); + } {code} Priority is only considered if both applications have a priority that was set. Do we really want that behavior? I'm thinking of the scenario where all the apps in the queue have no set priority then one of the apps has their priority set to very high or very low. That has no net effect since all other apps being compared in the queue don't have a priority set. A more intuitive behavior is to treat an unset priority as if the app had a default priority, so we aren't implicitly disabling priority checks in some scenarios. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)