[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-07-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614581#comment-14614581
 ] 

Sunil G commented on YARN-2004:
---

Ah, Sorry! Thank you [~devaraj.k] for correcting.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 
 0009-YARN-2004.patch, 0010-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-07-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612159#comment-14612159
 ] 

Sunil G commented on YARN-2004:
---

Thank you [~jianhe] for the comments.

- bq.Or this method has more responsibility than that ?
Yes. We are planning to check for acl's (priority acls) in this method. I was 
planning to handle that in separate ticket.
{noformat}
yarn.scheduler.capacity.root.queue_name.priority.acl=user1,user2
{noformat}
This config will be in queue level, and we could restrict certain users to use 
some high priority. So only a certain  users can use high priority, and other 
wont be able to submit application in that priority. This acl check was 
planning to add into  {{authenticateApplicationPriority}}.
- bq.we may merge the two into a single patch ?
I will merge these patches together and will upload into YARN-2003. 

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 
 0009-YARN-2004.patch, 0010-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-07-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611034#comment-14611034
 ] 

Jian He commented on YARN-2004:
---

- authenticateApplicationPriority : IIUC, all it does is just to take the 
config from yarn-site.xml (not capacity-scheduler.xml) and check the priority 
against that. I don't see much need of explicitly exposing an API in scheduler 
and inject the check there. Or this method has more responsibility than that ?

- Given that  YARN-2003 is just the API of YARN-2004 and we anyways have to 
review the two altogether,  we may merge the two into a single patch ? This is 
easier for review and you also do not need to split the patch and upload in two 
different places. And you can actually split the part about updating 
application priority at runtime and state store changes into a different patch.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 
 0009-YARN-2004.patch, 0010-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-06-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608857#comment-14608857
 ] 

Wangda Tan commented on YARN-2004:
--

Thanks for update, [~sunilg], comments to latest patch:

1)
bq. I feel we can do the priority comparision first. Do you see any specific 
usecase for priority as factor
Fair scheduler currently uses it as factor, see {{FairScheduler#getAppWeight}}

2)
SchedulerApplicationAttempt/SchedulerApplication.appPriority should be 
volatile. I found there're some other fields need to be changed, not caused by 
your patch. For example: SchedulerApplication.currentAttempt, etc. I suggest we 
make appPriority correct in this patch, and address others in separated ticket.

3)
Not caused by your patch, applicationComparator should be removed, and 
pendingApps in LeafQueue should use FifoOrderingPolicy to compare, we can do 
this in separated patch.

4)
dfltAppPriorityPerQueue should be default ..

5)
Is this check nececessary in SchedulerApplication.setPriority:
{code}
78  if (null != currentAttempt) {
79currentAttempt.setApplicationPriority(priority);
80  }
81}
{code}
Should we simply prohibit changing application priority when app's in 
submitting stage?

6)
Tests:
- Add a test to verify updateApplicationPriority works?
- Add end-to-end test to verify application priority works? (Not only check 
q.getApplications().iterator..)

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 
 0009-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-06-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606734#comment-14606734
 ] 

Wangda Tan commented on YARN-2004:
--

Thanks [~sunilg]'s update, some comments from my side:

1) getMaxClusterLevelAppPriority should return Priority.

2) updateApplicationPriority,
I think updateApplicationPriority needs to send a message to RMApp so RMApp can 
write it to state store, once RM fails and recovers app, we should get priority 
after updating.

And I suggest to create a method to SchedulerApplication, it will set priority 
to itself and SchedulerApplicationAttempt. And could you make set priority 
doesn't acquire application synchronized lock?

3) authenticateApplicationPriority: typically, LOG.debug needs wrapped by 
LOG.isDebugEnabled...

4) dflt should be better default, dflt is not a very common abbreviation in 
code to me. :)

5) change of compareInputOrderTo is not correct to me. {{compareInputOrderTo}} 
is to compare which application submission first. I think you need to modify 
{{FifoComparator}}, and compare priority based on SchedulerApplicationAttempt's 
priority. Changes of FairComparator is needed, but I think we can postpone the 
change, since FairComparator + Fifo may be more complicated : Should we do 
priority comparison first (treat priority as class) OR combination of them 
(treat priority as factor).

[~jianhe]:
bq. We may just move the check into RMAppManager...
This may not work, since priority mapping happens in scheduler side. (set app's 
priority according to queue's default priority).

bq. updateApplicationPriority - I think we don’t need to add an unused API now. 
I think update app priority is an important use case, according to [~jlowe] 
comment: 
https://issues.apache.org/jira/browse/YARN-1963?focusedCommentId=14328071page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14328071.
 I suggest to keep update application priority here.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-06-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606697#comment-14606697
 ] 

Jian He commented on YARN-2004:
---

thanks Sunil ! some comments on the patch:


- The app priority seems only used for pending applications, how about priority 
support for the actively running applications ?
- “default_application_priority”: the convention is to use “-” instead of “_”; 
similarly, the max-application-priority.
- this should not happen
{code}
if (null == queue) {
  throw new YarnException(
  During application init/update, failure occured due to an unknown 
  +  queue name ' + queueName + ' from priority authentication);
}
{code}
because queue will never be null, see below code in ClientRMService
{code}
if (submissionContext.getQueue() == null) {
  submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
}
{code}

- max-application-priority is defined in yarn-site.xml, but here it’s retrieved 
from capacity-scheduler.xml. We may just move the check into RMAppManager.
{code}
 if (priority.getPriority()  getMaxClusterLevelAppPriority()) {
  throw new YarnException(Invalid priority as Queue:  + queueName
  +  cannot support more than priority '
  + getMaxClusterLevelAppPriority() + ');
}
{code}
- updateApplicationPriority - I think we don’t need to add an unused API now. 
we can do this later when implementing the functionality of updating app 
priority



 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-06-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605153#comment-14605153
 ] 

Sunil G commented on YARN-2004:
---

Ah. About SchedulerAppkicationAttempt,  we still need null check for other 
schedulers. I ll update the patch with it.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-06-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603603#comment-14603603
 ] 

Eric Payne commented on YARN-2004:
--

Thanks, [~sunilg], for this fix.

- {{SchedulerApplicationAttempt.java}}:
{code}
  if (!getApplicationPriority().equals(
  ((SchedulerApplicationAttempt) other).getApplicationPriority())) {
return getApplicationPriority().compareTo(
((SchedulerApplicationAttempt) other).getApplicationPriority());
  }
{code}
-- Can {{getApplicationPriority}} return null? I see that 
{{SchedulerApplicationAttempt}} initializes {{appPriority}} to null.

- {{CapacityScheduler.java}}:
{code}
  if (!a1.getApplicationPriority().equals(a2.getApplicationPriority())) {
return a1.getApplicationPriority().compareTo(
a2.getApplicationPriority());
  }
{code}
-- Same question about {{getApplicationPriority}} returning null.
-- Also, can {{updateApplicationPriority}} call 
{{authenticateApplicationPriority}}? Seems like duplicate code to me.


 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-06-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603516#comment-14603516
 ] 

Wangda Tan commented on YARN-2004:
--

Thanks for updating, [~sunilg].

A quick comment before posting others, I think most of the code to check/update 
application priority can be reused by other schedulers. [~kasha], could you 
take a quick look at this patch to see if it is also needed for Fair Scheduler?

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch, 0007-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-28 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517252#comment-14517252
 ] 

Eric Payne commented on YARN-2004:
--

[~sunilg], Thanks for all of the work you are doing for this important feature.

{quote}
queueA: default=low
queueB: default=medium

The type of apps which we run may vary from queueA to B. So by keeping default 
priority different for each queue will help to handle such case. Assume more 
high level apps are running in queueA often, and medium level in queueB. Making 
different default priority can help here.
{quote}

I don't know a lot about the fair scheduler, but I'm pretty sure that in the 
capacity scheduler, there is no way to make one queue a higher priority than 
another. There is no way to compare job priorities between queues. That is, you 
can't say that jobs running in queueA have a higher priority than jobs running 
in queueB. So, it only makes sense to compare priorities between jobs in the 
same queue. Am I missing something?

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517310#comment-14517310
 ] 

Sunil G commented on YARN-2004:
---

Yes [~jlowe] You are correct.
We cannot compare highest priority across queues. If we do not do that, then 
there is not much meaning of keeping MAX priority per queue level.

Initially I plan to change that part in another jira where we can have the max 
priority application running in queue also to take into consideration while 
processing node heartbeat [tries to select which queue can be considered based 
on resource consumption]. But this make things more complicated now in CS.

I will be keeping this max in cluster level for now, so it can be accessible 
across all queues to make it simple.   [~jlowe] [~leftnoteasy] [~vinodkv], pls 
share your thoughts. 

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517317#comment-14517317
 ] 

Sunil G commented on YARN-2004:
---

Extremely sorry [~eepayne]
I mistyped your name as Jason.

Hope you understood my comment about priority config across queue. Pls let me 
know your thoughts.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-28 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517545#comment-14517545
 ] 

Eric Payne commented on YARN-2004:
--

[~sunilg],
bq. Hope you understood my comment about priority config across queue. Pls let 
me know your thoughts.
I think you are referring to [~leftnoteasy]'s suggestion that a cluster-wide 
config should be added to put a cap on the maximum priorities allowed in the 
queue. Is that correct? I think that makes sense so that cluster admins can put 
a cap on the number of priorities within any given queue.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503848#comment-14503848
 ] 

Wangda Tan commented on YARN-2004:
--

[~jlowe], [~sunilg]:

1) Regarding to per-queue priority limit:
I agree with per-queue priority limit could be added separately, but I think we 
may need a global priority limit to easier compare priority: It's easy to 
compare 101 and 190, but it maybe hard to compare 2123231223 and 2123123512. 
And showing a big-number priority on web UI is not good to me. So limit maximum 
priority is to have a better user experience.

2) Regarding to negative priority:
I prefer priority started from either 0/1.

3) Behavior when app.priority  max-priority-limit:
Should we just cap it by max-priority-limit instead of throw exception? 
Different from required-resource, priority is a hint to scheduler. Make a 
LOG.warn instead of reject it seems more friendly to me.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504219#comment-14504219
 ] 

Sunil G commented on YARN-2004:
---

Thank you [~jlowe]
{noformat}
@@ -327,6 +328,29 @@ private RMAppImpl createAndPopulateNewRMApp(
 ApplicationId applicationId = submissionContext.getApplicationId();
 ResourceRequest amReq =
 validateAndCreateResourceRequest(submissionContext, isRecovery);
+
+Priority appPriority = submissionContext.getPriority();
+if (null != appPriority) {
+  try {
+rmContext.getScheduler().authenticateApplicationPriority(
+submissionContext.getPriority(), user,
+submissionContext.getQueue(), applicationId);
+  } catch (IOException e) {
+throw RPCUtil.getRemoteException(e.getMessage());
+  }
+} else {
+  // Get the default priority from Queue and set to Application
+  try {
+appPriority = rmContext.getScheduler()
+
.getDefaultApplicationPriorityFromQueue(submissionContext.getQueue());
+  } catch (IOException e) {
{noformat}

Above code snippet is from YARN-2003 which is handing changes in RM and Events 
for priority. When an app is submitted w/o priority, we would like to fill in 
with default priority from queue.

bq.why would we want to limit which priorities are running within a queue?
queueA: default=low 
queueB: default=medium

The type of apps which we run may vary from queueA to B. So by keeping default 
priority different for each queue will help to handle such case. Assume more 
high level apps are running in queueA often, and medium level in queueB. Making 
different default priority can help here.

[~leftnoteasy] Do you mean a global max priority which can help to limit the 
number associated with a priority ?

bq. we just cap it by max-priority-limit instead of throw exception? 
Yes. I will update this part as against throwing exception.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503158#comment-14503158
 ] 

Sunil G commented on YARN-2004:
---

Thank you very much for the comments.

bq. default of default-priority is -1
I also have similar opinion as told by [~jlowe]. If we are looking for linux 
like priority and with range (-N,N), we may need the support of negative. But 
as a simple comparison, both do not matter much. For maintainability, I also 
support use of +ve integer and 0 as default.  

bq. We don't need per-user settings to get the basic
A user can submit an application with a given priority.
This priority will be validated against
1) whether is a valid priority as per the cluster priority list (0:Low, 
1:Medium, 2:High)
2) whether is valid for the given queue config (QueueA {default=Low, 
max=Medium})
Hence Low and Medium are accessible for QueueA
3) ACLs (This will be done with a separate ticket)

Now if user didnt submit app with a priority, we can take the default priority 
(Here for QueueA it is Low) configured for given queue.
In earlier patch, this point was not added. I will add the same in subsequent 
patch.

Coming to the point of discussion, I feel we can do this above design first, 
and then can handle per-user priority feature as a separate ticket.
[~leftnoteasy] and [~jlowe] pls suggest your thoughts

bq. There appear to be some missing NULL checks
I am sorry for this, it will be removed.

As suggested, I will change the log part and will upload a new version of patch.


 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503043#comment-14503043
 ] 

Jason Lowe commented on YARN-2004:
--

I don't think it matters if we allow negative priorities.  Numbers are easy to 
compare, even when negative numbers enter the mix.  If people feel strongly 
that negative numbers are too confusing to compare then we can force them to be 
non-negative or even non-zero if that is also too confusing.  If we allow 
negative priorities then I suggest we use zero as the default priority.  Any 
positive priority will be higher and scheduled before the default priority.  
Any negative priority will be lower and scheduled after the default priority.  
Simple.  If we don't allow negative priorities then be sure to set the default 
such that users can set applications to be lower priority than the default.

As for per-user priority settings, again I'm advocating for simplicity first.  
We don't need per-user settings to get the basic, and highly-requested feature 
working first.  Adding this feature later should not disrupt any of the initial 
APIs, as these would be separate, admin APIs (or just configs) that would not 
affect app submission.  If per-user priority defaults are needed after the 
basic priority functionality is there then we can add it then.

I'm not sure about the current state of the patch and how it relates to 
YARN-2003.  I see there are default priorities, but I don't see them being 
really used in either this patch nor in the latest YARN-2003 patch.  I'm also 
wondering if we really need per-queue priority limits.  Currently application 
priorities have no effect _between_ queues, therefore I don't understand why we 
would want/need to limit application priorities in one queue vs. another queue. 
 Maybe I'm missing the use-case for this feature.  If we don't have a solid 
use-case for it then we should not add it until we need it.  Again, this is 
something we can always add later.

There appear to be some missing NULL checks when it comes to priorities in the 
following code:
{code}
+  if (a1.getApplicationPriority() != null
+   !a1.getApplicationPriority().equals(a2.getApplicationPriority())) 
{
+return a1.getApplicationPriority().compareTo(
+a2.getApplicationPriority());
+  }
{code}
If a1.getApplicationPriority() returns non-null but a2.getApplicationPriority() 
returns null then I think we will NPE, as Priority.compareTo has no null checks.

Nit: I'd like to see the submitted application priority logged along with other 
essential app details when the app is submitted rather than a separate log 
message just for priority.  The RM log is already too wordy, and this INFO 
message will add to it.  Maybe it should just be a debug log?
{code}
+LOG.info(Submitted priority ' + priority.getPriority()
++ ' is acceptable in queue : + queueName + for application:
++ applicationId);
{code}


 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503526#comment-14503526
 ] 

Jason Lowe commented on YARN-2004:
--

I'm still missing why it makes sense for queues to have different access to 
priorities.  Currently priorities only have an effect within a queue, not 
between queues, so why would we want to limit which priorities are running 
within a queue?  I'm still missing the use-case for this, and as such it looks 
like additional complexity without any benefit.

bq. I have considered default priority scenario where if submitted app does not 
gave any priority, then default will be taken. So chances of null here in above 
scenario wont happen.

Where is this occurring?  I see a lot of getDefault*Priority functions but not 
where they're actually used to set the app's priority if no priority is 
specified during app submission.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 
 0006-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499028#comment-14499028
 ] 

Wangda Tan commented on YARN-2004:
--

Some comments:
1) I noticed default of default-priority is -1, do you think we should limit 
priority = 0? With existing interface in queue, we don't limit the lowest 
priority, so maybe we should limit it ourselves.
2) Beyond priority settings on queue, do you think we should have per-user 
priority setting? If we don't limit user's priority, we will end up with all 
users asking for max-priority in the queue. And also user's default could be 
different, CEO's default may be max-priority. But this needs input of real 
world use cases. ([~jlowe], thoughts?)
3) null check in app priority comparator still exist, did you mention to remove 
it?
bq.  i can remove NULL check. Will only have a direct compareTo check for 
priority.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327117#comment-14327117
 ] 

Sunil G commented on YARN-2004:
---

Thank you [~jlowe] and [~leftnoteasy] for the input.

Yes, there are alternate ways we can achieve scenario 1. Also for scenario 2, 
YARN-2009 will help. Hence this JIRA can now currently focus on the basic 
priority addition to Schedulers.

bq.Priority is only considered if both applications have a priority that was 
set. 

If a set of priorities is loaded to RM and one is  chosen as Default priority 
for a queue, it can be any priority from lowest to highest. So All the 
applications running w/o priority will be given as this default priority. Hence 
some lower priority application will end up with lower preference than an 
application running w/o priority. 
But this is also a perception from user. If user can consider that all 
applications running w/o priority will fall to default chosen one per queue , 
then the behavior will be as expected. 
On that note, I also feel that i can consider all applications running w/o 
priority will be of Default priority. [~jlowe] Pls share your thoughts w.r.t 
the above scenario.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327650#comment-14327650
 ] 

Sunil G commented on YARN-2004:
---

Yes [~jlowe]. 
Agreeing to your point. As of now, I have given a configuration to specify 
default priority in a queue. That can be applied for those applications which 
are submitted w/o priority. A cluster wide config also will be added, and given 
a queue level config, it can override customer wide default value. I will 
update patch as per this understanding. Thank you.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327527#comment-14327527
 ] 

Jason Lowe commented on YARN-2004:
--

My thoughts are as I stated above.  We should not ignore priorities if one of 
the apps does not have a priority specified.  A lack of a specified priority on 
an application should imply a default priority value and still be compared to 
the other application's priority rather than skipping the priority comparison.  
That would be the expected behavior.  We can come up with all sorts of schemes 
to determine what the default priority value should be (e.g.: hardcoded default 
value, cluster-wide configurable, queue-specific configurable, etc.).  The 
important part is to not skip the priority comparison completely as that would 
be unexpected behavior for users.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327730#comment-14327730
 ] 

Sunil G commented on YARN-2004:
---

As per YARN-2003, RMAppManager#submitApplication process input from 
submissionContext. I will add a case here which will handle the scenario where 
priority is NULL from submission context. It can be updated with default 
priority from queue. 

As for this patch, i can remove NULL check. Will only have a direct compareTo 
check for priority.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325576#comment-14325576
 ] 

Sunil G commented on YARN-2004:
---

About the priority inversion problem, I feel we could use below approach

1. To identify lower priority application which is waiting for resource over a 
long period,  *lastScheduledContainer* in *SchedulerApplicationAttempt* can be 
used to get the timestamp of last allocation. And based on a time limit 
configuration, it is possible to identify the apps which are starving.
2. Identify few higher applications and decrease its headroom explicitly by one 
resource request of lower priority application.
3. Reset the headroom of higher priority application back once lower priority 
application has got the container. 
Kindly share the thoughts on same.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326197#comment-14326197
 ] 

Sunil G commented on YARN-2004:
---

Hi Jason, thank you for sharing the thoughts.

In one way, we need not have to think abt headroom and userlimit. Still I would 
like to share 2 scenarios

1. Similar to MAPREDUCE-314. A job j1 is submitted with lower priority and 
finished its map tasks, reducers are running. later j2 and j3 came in and took 
over cluster resources. if a map is failed, by loosing some map o/p, there are 
no chances of getting a resource for j1 till j2 and j3 releases resources and 
not allocating it. In a -ve scenario, j1 will starve for much longer. This was 
one of the intention to temporarily pause demand from j2 and j4 for a while and 
spare some resources for j1.

2. User Limit: Assume the factor is 25, and 4 users can take 25% each from 
cluster. 5th user has to wait. Assume the highest priority app is submitted by 
5th user. He may not get resources untill demand from first 4 users(for 
existing apps) are over. Do you feel this is to be handled?

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326238#comment-14326238
 ] 

Jason Lowe commented on YARN-2004:
--

For your first scenario, it can happen today without priority.  MR jobs ask for 
resources in waves -- first all the maps, then over time it ramps up reducers.  
Multiple jobs in the same queue from the same user can collide in different 
phases.  That's the whole point of the headroom calculation and reporting -- to 
allow AMs to realize this scenario is happening and react to it.  In this case 
what will happen is j1 will see its headroom is zero and start killing reducers 
to make room for the failed map task.  After killing the reducers there will be 
some free resources in the cluster (if they weren't stolen by another, 
underserved queue).  Then the question goes to who will get those resources.  
If we're using the default priority, j1 will get first crack at them due to 
FIFO priority.  If j2 or j3 were made higher priority then j1 will see that its 
headroom is _still_ zero after killing some reducers and will probably kill 
some more to try to make room.  Rinse, repeat until j1 is out of reducers to 
shoot or gets the resources it needs to run the failed map.

For the second scenario, the 5th user will _still_ be the first one to get any 
spare resources in the queue because he has the highest priority app.  Note 
that the user limit calculation does not involve comparing a user's current 
limit with other user's usage.  It's just a computation of what's available in 
the queue and what you're allowed based on the configured user limit and user 
limit factor.  So what will happen is the 5th user will continue to consume any 
free resources in the queue until either the app is satiated or the 5th user 
hits the 25% cap.  If there are no free resources then the 5th user's app will 
starve (without preemption) just like the rest until resources show up.  Again, 
higher priority just means you're first in line to get resources when they are 
freed up, and it doesn't change anything else.

We can discuss adding preemption into the mix to force higher priority apps to 
get their requested resources faster in a full queue.  However I think the 
first step is to get priority scheduling working for resources that are free in 
the queue in the non-preemption case, as that's still very useful in practice.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326582#comment-14326582
 ] 

Wangda Tan commented on YARN-2004:
--

[~sunilg],
Thanks for uploading patch,

I just read comments from [~jlowe], I think what he said all make sense to me. 

For scenario#1 
There're some possible solutions to tackle the priority inversion problem you 
just mentioned. But it is more important to make CS with basic priority works 
first. What you said is more like adjustable priority, which could be updated 
according to application's waiting time or other factors.

For scenario#2
It is possible that a user with higher priority application comes but there's 
no available resource in a queue, preemption policy should reclaim resource 
from other users. YARN-2009 should cover it.

General approach of the patch looks good to me.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326145#comment-14326145
 ] 

Jason Lowe commented on YARN-2004:
--

I'm not sure I understand the priority inversion problem and why we would be 
changing headroom.  The headroom has no priority calculations in it.  As I see 
it, the priority scheduling is _only_ changing the order in which applications 
are examined when deciding how to assign free resources in a queue.  In other 
words, it does _not_ change the following:

- the priority order between queues (i.e.: deciding which queue is first in 
line to obtain free resources in the cluster)
- the user limits within a queue (i.e.: making an app higher priority does not 
implicitly give the user more room to grow within the queue than normal)
- the headroom for an app within the queue (higher priority doesn't change the 
queue capacity or user limits)

For example, a user is running app A then follows up with app B.  The user 
decides app B is pretty important and raises its priority.  This doesn't change 
the user limits within the queue or the headroom of those apps, but it does 
change which app will be assigned a spare resource if it is available.  If the 
queue is totally full then both apps will be told their headroom is zero.  One 
(or both) of them will need to free up some resources to make progress.  When 
resources becomes available, app B will have the first chance to claim them 
since it was made a higher priority than A.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326737#comment-14326737
 ] 

Jason Lowe commented on YARN-2004:
--

I took a closer look at the patch, and the following logic seems suspect:

{code}
+  if (a1.getApplicationPriority() != null
+   a2.getApplicationPriority() != null
+   !a1.getApplicationPriority().equals(a2.getApplicationPriority())) 
{
+return a2.getApplicationPriority().compareTo(
+a1.getApplicationPriority());
+  }
{code}

Priority is only considered if both applications have a priority that was set.  
Do we really want that behavior?  I'm thinking of the scenario where all the 
apps in the queue have no set priority then one of the apps has their priority 
set to very high or very low.  That has no net effect since all other apps 
being compared in the queue don't have a priority set.  A more intuitive 
behavior is to treat an unset priority as if the app had a default priority, so 
we aren't implicitly disabling priority checks in some scenarios.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)