[jira] [Updated] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Fix Version/s: (was: 3.0.0-beta1)
   3.0.0-alpha3

[~andrew.wang], this JIRA is already in 3.0.0.alpha4 (revision 
ca13b224b2feb9c44de861da9cbba8dd2a12cb35). I reopened this JIRA so that I could 
implement the backport to branches 2 and 2.8.

[~sunilg], yes, if we could get YARN-3140 backported to branch-2, that would 
indeed be the best. However, I am worried about the time it will take to do 
that. I don't think I have the bandwidth to do that right now. Are you 
volunteering to do the backport to branch-2?

Regarding locking for {{activeUsersSet}}, I don't think it is necessary if we 
can overcome the concurrent modification exception problem. In fact, simple 
synchronization on this method will cause deadlocks, and I think read/write 
locks will have the same problem, if on a smaller scale.

I don't think locking is necessary. If the sum is inaccurate, it will be 
recomputed during the next round because the list of active users has changed.

On a related note (if no locking is implemented), I think the following logic 
is incorrect:
{code}
for (String userName : activeUsersSet) {
  // Do the following instead of calling getUser so locking is not needed.
  User user = users.get(userName);
  count += (user != null) ? user.getWeight() : 1.0f;
}
{code}
I think that if the user was in {{activeUsersSet}} when the for loop started 
but was later removed from the {{users}} map, the user weight should be 0.0f 
instead of 1.0f

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.branch-2.016.patch

Submitting patch with cherry-pick from trunk to branch-2 so that pre-commit 
build can run on it.

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086090#comment-16086090
 ] 

Eric Payne commented on YARN-2113:
--

Committed the cherry-pick to branch-2.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086287#comment-16086287
 ] 

Eric Payne commented on YARN-5892:
--

The javadoc failures are because of the naming convention of the '_' display 
method:
{code}
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java:580:
 warning: '_' used as an identifier
{code}

The {{TestRMRestart}} failure is the same as HADOOP-14637.
{{resourcemanager.security.TestDelegationTokenRenewer}} and 
{{TestCapacityScheduler}} both pass for me in my local repo build.

I will cherry-pick this commit from trunk into branch-2.

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-17 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.branch-2.8.017.patch

Uploading YARN-5892.branch-2.8.017.patch because I used 
{{ConcurrentHashMap.newKeySet()}} in the previous patch, and that method 
doesn't exist in JDK 1.8.

Replaced it with new {{ConcurrentHashMap().keySet("dummy")}}

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch, 
> YARN-5892.branch-2.8.017.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-11 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083038#comment-16083038
 ] 

Eric Payne commented on YARN-2113:
--

bq. revert branch-2 commit for YARN-2113 and cherry-pick YARN-5889/YARN-2113 to 
branch-2, correct? 
Correct.
bq. it's better to run Jenkins before push.
[~leftnoteasy], I'm sorry. I don't understand what you mean.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-11 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083084#comment-16083084
 ] 

Eric Payne commented on YARN-2113:
--

I see. Sure, will do.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-11 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082989#comment-16082989
 ] 

Eric Payne commented on YARN-2113:
--

I should not have created a separate patch for the branch-2 backport. The 
branch-2.8 backport is fine, but not the branch-2 one.

[~sunilg], [~leftnoteasy], [~jlowe]:

If no one objects, I will do the following:
# Revert the commit for branch-2 YARN-2113.branch-2.0020.patch (eda4ac0)
# Cherry pick YARN-5889 to branch-2 from trunk
# Cherry pick YARN-2113 to branch-2 from trunk

I have done this in my local repo and it comes back fairly cleanly.

Here is what happened:
In trunk, the intra-queue preemption feature depends on the following:
- YARN-3140: Improve locks in AbstractCSQueue/LeafQueue/ParentQueue
- YARN-5889: Improve and refactor user-limit calculation in capacity scheduler

I thought that YARN-3140 had not been backported to either branch-2 nor 
branch-2.8. I attempted to cherry-pick YARN-3140 to branch-2.8 but had several 
deadlock issues. Therefore, I made the decision to create a separate branch-2.8 
patch.

 I thought I needed to do the same thing for branch-2, but I did not. YARN-3140 
has been cherry-picked to branch-2.


> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080811#comment-16080811
 ] 

Eric Payne commented on YARN-5892:
--

Thanks [~sunilg]
bq. YARN-3140 is something i would like to see in branch-2.8 as well. I ll pool 
some bandwidth for same, but i might need some more time (considering tests).
Sorry, but I don't think my statement about YARN-3140 was clear.
- I tried backporting YARN-3140 to branch-2.8, but I had several deadlock 
problems.
- I also tried backporting YARN-3140 to branch-2, and I also ran into deadlock 
problems, but not as many and I didn't really try to look into it as hard as 
branch-2.8
- We are anxious to get the user weights feature backported soon.
I don't think it is worth the effort to try to backport YARN-3140 to 
branch-2.8. However, for branch-2, I will make time to try the backport again.
{quote}
bq. On a related note (if no locking is implemented), I think the following 
logic is incorrect
May be we can pull this to another ticket and retrospect there. I think we need 
a second look in this in detail.
{quote}
{code}
  count += (user != null) ? user.getWeight() : 1.0f;
{code}
It's just a matter of changing the 1 to 0. I would like to go ahead and do that 
in the next patch.

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.branch-2.015.patch

[~sunilg], [~leftnoteasy], [~jlowe]:
Since branch-2 and 2.8 are somewhat different than trunk, it was necessary to 
make some design decisions that I would like you to be aware of when reviewing 
this backport:
- As noted 
[here|https://issues.apache.org/jira/browse/YARN-2113?focusedCommentId=16023111=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16023111],
 I did not backport YARN-5889 because it depends on locking changes from 
YARN-3140 and other locking JIRAs.
- In trunk, a change was made in YARN-5889 that changed the way 
{{computeUserLimit}} calculates user limit. In branch-2 and branch-2.8, 
{{userLimitResource = (all used resources in queue) / (num active users in 
queue)}}. In trunk after YARN-5889, {{userLimitResource = (all used resources 
by active users in queue) / (num active users)}}.
-- Since branch-2 and 2.8 use {{all used resources by active users in queue}} 
instead of {{all used resources in queue}}, it is not necessary to modify 
{{LeafQueue}} to keep track of when resources are activated and deactivated 
like was done in {{UsersManager}} in trunk.
-- However, I did add the activeUsersSet to LeafQueue and all the places it is 
modified so it can be used to sum active users times weight.
-- Therefore, it wasn't necessary to create a separate UsersManager class as 
was done in YARN-5889. Instead, I added a small amount of code in 
ActiveUsersManager to keep track of active users and to indicate when users are 
either activated or deactivated.
- {{LeafQueue#sumActiveUsersTimesWeights}} should not do anything that 
synchronizes or locks. This is to avoid deadlocks because it is called by 
getHeadRoom (indirectly), which is called by {{FiCaSchedulerApp}}.
{code}
  float sumActiveUsersTimesWeights() {
float count = 0.0f;
for (String userName : activeUsersSet) {
  User user = users.get(userName);
  count += (user != null) ? user.getWeight() : 1.0f;
}
return count;
  }
{code}
-- This opens up a race condition for when a user is added or removed from 
{{activeUsersSet}} while {{sumActiveUsersTimesWeights}} is iterating over the 
set.
--- I'm not an expert in Java syncronization. Does this expose {{LeafQueue}} to 
concurrent modification exceptions?
--- There is no {{ConcurrentHashSet}} so should I make {{activeUsersSet}} a 
{{ConcurrentHashMap}}?


> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha4
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened YARN-5892:
--

Re-opening in order to backport to branch-2 and 2.8.

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha4
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5889) Improve and refactor user-limit calculation in capacity scheduler

2017-07-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5889:
-
Fix Version/s: 2.9.0

I cherry-picked this commit to branch-2 as a prereq for YARN-2113

> Improve and refactor user-limit calculation in capacity scheduler
> -
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened YARN-2113:
--

{quote}
1. Revert the commit for branch-2 YARN-2113.branch-2.0020.patch (eda4ac0)
2. Cherry pick YARN-5889 to branch-2 from trunk
3. Cherry pick YARN-2113 to branch-2 from trunk
{quote}
Beginning now

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: (was: YARN-2113.branch-2.0021.patch)

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.0021.patch

Uploading patch YARN-2113.branch-2.0021.patch. This patch represents the 
cherry-pick of the trunk commit back to branch-2.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-17 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.branch-2.8.018.patch

bq. Replaced it with {{new ConcurrentHashMap().keySet("dummy")}}
Sigh. I need to be more careful about what is and is not in JDK 1.7. This 
method with this signature is also not in JDK1.7

Trying again with YARN-5892.branch-2.8.018.patch:
{{Collections.newSetFromMap(new ConcurrentHashMap());}}

[~sunilg], [~leftnoteasy], [~jlowe]: When the pre-commit build comes back 
clean, can you please have a look?


> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch, 
> YARN-5892.branch-2.8.017.patch, YARN-5892.branch-2.8.018.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077246#comment-16077246
 ] 

Eric Payne edited comment on YARN-5892 at 7/7/17 12:33 PM:
---

[~sunilg], [~leftnoteasy], [~jlowe]:
Since branch-2 and 2.8 are somewhat different than trunk, it was necessary to 
make some design decisions that I would like you to be aware of when reviewing 
this backport:
- As noted 
[here|https://issues.apache.org/jira/browse/YARN-2113?focusedCommentId=16023111=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16023111],
 I did not backport YARN-5889 because it depends on locking changes from 
YARN-3140 and other locking JIRAs.
- In trunk, a change was made in YARN-5889 that changed the way 
{{computeUserLimit}} calculates user limit. In branch-2 and branch-2.8, 
{{userLimitResource = (all used resources in queue) / (num active users in 
queue)}}. In trunk after YARN-5889, {{userLimitResource = (all used resources 
by active users in queue) / (num active users)}}.
-- Since branch-2 and 2.8 use {{all used resources in queue}} instead of {{all 
used resources by active users in queue}}, it is not necessary to modify 
{{LeafQueue}} to update used resource when users are activated and deactivated 
like was done in {{UsersManager}} in trunk.
-- However, I did add the activeUsersSet to LeafQueue and all the places it is 
modified so it can be used to sum active users times weight.
-- Therefore, it wasn't necessary to create a separate UsersManager class as 
was done in YARN-5889. Instead, I added a small amount of code in 
ActiveUsersManager to keep track of active users and to indicate when users are 
either activated or deactivated.
- {{LeafQueue#sumActiveUsersTimesWeights}} should not do anything that 
synchronizes or locks. This is to avoid deadlocks because it is called by 
getHeadRoom (indirectly), which is called by {{FiCaSchedulerApp}}.
{code}
  float sumActiveUsersTimesWeights() {
float count = 0.0f;
for (String userName : activeUsersSet) {
  User user = users.get(userName);
  count += (user != null) ? user.getWeight() : 1.0f;
}
return count;
  }
{code}
-- This opens up a race condition for when a user is added or removed from 
{{activeUsersSet}} while {{sumActiveUsersTimesWeights}} is iterating over the 
set.
--- I'm not an expert in Java syncronization. Does this expose {{LeafQueue}} to 
concurrent modification exceptions?
--- There is no {{ConcurrentHashSet}} so should I make {{activeUsersSet}} a 
{{ConcurrentHashMap}}?



was (Author: eepayne):
[~sunilg], [~leftnoteasy], [~jlowe]:
Since branch-2 and 2.8 are somewhat different than trunk, it was necessary to 
make some design decisions that I would like you to be aware of when reviewing 
this backport:
- As noted 
[here|https://issues.apache.org/jira/browse/YARN-2113?focusedCommentId=16023111=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16023111],
 I did not backport YARN-5889 because it depends on locking changes from 
YARN-3140 and other locking JIRAs.
- In trunk, a change was made in YARN-5889 that changed the way 
{{computeUserLimit}} calculates user limit. In branch-2 and branch-2.8, 
{{userLimitResource = (all used resources in queue) / (num active users in 
queue)}}. In trunk after YARN-5889, {{userLimitResource = (all used resources 
by active users in queue) / (num active users)}}.
-- Since branch-2 and 2.8 use {{all used resources by active users in queue}} 
instead of {{all used resources in queue}}, it is not necessary to modify 
{{LeafQueue}} to keep track of when resources are activated and deactivated 
like was done in {{UsersManager}} in trunk.
-- However, I did add the activeUsersSet to LeafQueue and all the places it is 
modified so it can be used to sum active users times weight.
-- Therefore, it wasn't necessary to create a separate UsersManager class as 
was done in YARN-5889. Instead, I added a small amount of code in 
ActiveUsersManager to keep track of active users and to indicate when users are 
either activated or deactivated.
- {{LeafQueue#sumActiveUsersTimesWeights}} should not do anything that 
synchronizes or locks. This is to avoid deadlocks because it is called by 
getHeadRoom (indirectly), which is called by {{FiCaSchedulerApp}}.
{code}
  float sumActiveUsersTimesWeights() {
float count = 0.0f;
for (String userName : activeUsersSet) {
  User user = users.get(userName);
  count += (user != null) ? user.getWeight() : 1.0f;
}
return count;
  }
{code}
-- This opens up a race condition for when a user is added or removed from 
{{activeUsersSet}} while {{sumActiveUsersTimesWeights}} is iterating over the 
set.
--- I'm not an expert in Java syncronization. Does this expose {{LeafQueue}} to 
concurrent modification exceptions?
--- There is no {{ConcurrentHashSet}} so should I make {{activeUsersSet}} a 

[jira] [Updated] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.branch-2.8.016.patch

I cherry-picked the trunk commit to branch-2 for this JIRA.

I am now attaching YARN-5892.branch-2.8.016.patch, which had to be refactored 
from the branch-2 version, due to the differences in branch-2.8.

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100167#comment-16100167
 ] 

Eric Payne commented on YARN-5892:
--

Thanks [~sunilg] for the review.
{quote}
In ActiveUsersManager, could we avoid activeUsersChanged if possible. May be we 
could keep an active set in ActiveUsersManager itself and we could clear this 
set when activate/deactivateApplication is invoked.
{quote}
In {{ActiveUsersManager}}, the keyset of {{usersApplications}} already holds a 
list of active users, but it's not enough to know whether or not there are 
users in this keyset. {{LeafQueue}} also needs to know when a users is added to 
or removed from this set. This is so that {{LeafQueue#computUserLimit}} is only 
recomputing the sum of the active user's weights after a user goes active or 
inactive, not every time through the code.

{{LeafQueue}} could possibly keep its own copy of 
{{ActiveUsersManager#getNumActiveUsers}} and then compare that with 
{{getNumActiveUsers}} when it needs to know if it must recompute the sum of 
active user weights. However, that seems more complicated and error prone than 
using {{activeUsersChanged}}.


> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch, 
> YARN-5892.branch-2.8.017.patch, YARN-5892.branch-2.8.018.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.0021.patch

Resubmitting patch to deal with raw type warnings.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084840#comment-16084840
 ] 

Eric Payne commented on YARN-2113:
--

The branch-2 patch seems fine.
The following unit tests worked in my local repo environment:
{noformat}
hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer 
{noformat}
The {{TestRMRestart}} failure is the same as HADOOP-14637.

If there are no objections, I will will cherry-pick YARN-2113 into branch-2 
tomorrow.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-29 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.8.0020.patch
YARN-2113.branch-2.0020.patch

Thanks [~sunilg]. Sorry about the debug comment. I have removed it from both 
branch-2 and branch-2.8 patches and have uploaded new versions.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070130#comment-16070130
 ] 

Eric Payne commented on YARN-2113:
--

bq. Resolving this for now so I can run releasenotes generation.
[~andrew.wang], are you done generating the release notes? May I please re open 
this so that the branch-2 pre-commit can also run (the branch-2.8 precommit 
ran, but not branch-2).

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070101#comment-16070101
 ] 

Eric Payne commented on YARN-2113:
--

bq. +1 on branch-2 and branch-2.7 patches.
Thanks [~sunilg]. I think you meant branch-2.8, not branch-2.7 ;-)

The failed tests worked for me in my local environment, so I assert that they 
are not related to this patch.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-30 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened YARN-2113:
--

Reopenning so can run branch-2 pre-commit.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076807#comment-16076807
 ] 

Eric Payne commented on YARN-2113:
--

Thanks Sunil. I committed to branch 2 and 2.8.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: (was: YARN-2113.branch-2.0020.patch)

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.0020.patch

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075317#comment-16075317
 ] 

Eric Payne commented on YARN-2113:
--

The TestRMRestart test failure looks like it's the same as YARN-6759.
The other test failures don't happen for me locally.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113.branch-2.8.0020.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6759) TestRMRestart.testRMRestartWaitForPreviousAMToFinish is failing in trunk

2017-07-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075279#comment-16075279
 ] 

Eric Payne commented on YARN-6759:
--

Please note that this is not just failing in trunk. It fails for me in branch-2 
as well:
{noformat}
java.lang.IllegalArgumentException: Total wait time should be greater than 
check interval time
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:311)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:613)
{noformat}

> TestRMRestart.testRMRestartWaitForPreviousAMToFinish is failing in trunk
> 
>
> Key: YARN-6759
> URL: https://issues.apache.org/jira/browse/YARN-6759
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Naganarasimha G R
>
> {code}
> java.lang.IllegalArgumentException: Total wait time should be greater than 
> check interval time
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:618)
> {code}
> refer 
> https://builds.apache.org/job/PreCommit-YARN-Build/16229/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testRMRestartWaitForPreviousAMToFinish/
>  which ran for YARN-2919



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983699#comment-15983699
 ] 

Eric Payne commented on YARN-2113:
--

Thanks [~sunilg]. I have one more (last?) concern.

Shouldn't the following code be {{lessThanOrEqual}} instead of {{lessThan}}? 
The scheduler always assigns one more container than the UL, so I think that if 
removing this container brings the user down to equal its UL, the scheduler 
would give the container back to the user:
{code:title=IntraQueueCandidatesSelector#preemptFromLeastStarvedApp}
  // If selected container brings down resource usage under its user's
  // UserLimit, we must skip such containers.
  if (Resources.lessThan(rc, clusterResource,
  Resources.subtract(userResourceUsage, c.getAllocatedResource()),
  computedUserLimitPerUser.get(app.getUser()))
  && preemptionContext.getIntraQueuePreemptionOrder()
  .equals(IntraQueuePreemptionOrder.USERLIMIT_FIRST)) {
{code}

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985032#comment-15985032
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg], the latest patch seems to work almost as I expect. However, I still 
have one question.

When the {{preemption-order}} property is set to {{USERLIMIT_FIRST}}, priority 
preemption will not happen. For Example:
- {{preemption-order = USERLIMIT_FIRST}}
- {{user1}} starts {{app1}} at priority 0 and consumes the whole cluster
- {{user1}} starts {{app2}} at priority 1

In this scenario, no preemption occurs.  Is that expected? I don't expect this 
behavior, since if {{preemption-order = PRIORITY_FIRST}}, user-limit preemption 
will still happen as expected when all apps are the same priority.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-04-26 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.012.patch

[~leftnoteasy]
bq. updateUserWeights: Should we re-count 
Yes. Thanks for catching this.
bq. count users weight is an O(n)  ... I don't want this patch blocked by the 
comment, we can revisit it once we identify any perf issues.
OK
bq. I suggest you can move the check ... to LeafQueue#setupQueueConfigs.
Done. Good idea. Thanks.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981780#comment-15981780
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg], YARN-2113.0009.patch has a problem when {{preemption-order}} is 
{{USERLIMIT_FIRST}}.

Cluster looks like:
||*Queue Capacity*||*Queue Max Capacity*||*Minimum User Limit 
Percent*||*Default Container Size*||*Vcores Per Container*||
|10GB|20GB|25%|0.5GB|1|

 With one user taking up the whole queue and cluster, the {{Active Users Info}} 
section looks like this (which seems accurate except I would have expected the 
{{Max Resource}} to be 5120, but since it's not active, it probably doesn't 
matter):
||*User Name*||*Max Resource*||*Used Resource*||
|hadoop1|||

 When the second user starts, some containers get preempted, but preemption 
stops early and the {{Active Users Info}} section looks like this (with 
inaccurate values for {{hadoop1}}:
||*User Name*||*Max Resource*||*Used Resource*||
|hadoop2|||
|hadoop1|||

- The {{Used Resource}} for {{hadoop2}} is accurate, but {{Used Resource}} for 
{{hadoop1}} is incorrect. {{hadoop1}} is actually using 17.5GB, while the above 
says it is using 5GB.

Perhaps it's as a result of the following code?
{code:title=IntraQueueCandidatesSelector#preemptFromLeastStarvedApp}
  if (ret && preemptionContext.getIntraQueuePreemptionOrder()
  .equals(IntraQueuePreemptionOrder.USERLIMIT_FIRST)) {
Resources.subtractFrom(userResourceUsage, c.getAllocatedResource());
  }
{code}


> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981830#comment-15981830
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg],
It looks like 
{{IntraQueueCandidatesSelector#initializeUsageAndUserLimitForCompute}} should 
be cloning the used resources from leafqueue:
{code}

   // Initialize used resource of a given user for rolling computation.
   rollingResourceUsagePerUser.put(user,
-  leafQueue.getUser(user).getResourceUsage().getUsed(partition));
+Resources.clone(
+  leafQueue.getUser(user).getResourceUsage().getUsed(partition)));
   if (LOG.isDebugEnabled()) {
 LOG.debug("Rolling resource usage for user:" + user + " is : "
 + rollingResourceUsagePerUser.get(user));
{code}


> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-04-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983235#comment-15983235
 ] 

Eric Payne commented on YARN-5892:
--

I will need to upload a new patch at any rate, because the check for {{0 < 
weight < 100/MULP}} doesn't happen if the weight is set at a parent level and 
inherited.

This brings up another (very small) concern that I have regarding the way I 
implemented this part. I added the collection of user weights as part of 
{{AbstractCSQueue}} because that's where the basic queue initialization 
happens, and that's where we added this same functionality for preemption 
properties. However, in order to do the check for {{weight < 100/MULP}}, I had 
to do an {{instanceof LeafQueue}} check, which I was a little uncomfortable 
with because it's the only place that happens in {{AbstractCSQueue}}.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-28 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989225#comment-15989225
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg], I'm sorry to say that the latest patch (YARN-2113.0013.patch) makes 
things worse for USERLIMIT_FIRST mode. Not only does it allow the ((100/MULP) + 
1) user to preempt other users down to 1 container, it also oscillates. In 
fact, even in PRIORITY_FIRST mode, it preempts users below their user limit, 
even when #users < (100/MULP).

I suggest that we separate the USERLIMIT/PRIORITY_FIRST code into a separate 
JIRA. Can you easily revert all USERLIMIT_FIRST/PRIORITY_FIRST changes? I would 
suggest using YARN-2113.0012.patch as a basis because YARN-2113.0013.patch has 
problems even in the PRIORITY_FIRST case.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6536) TestAMRMClient.testAMRMClientWithSaslEncryption fails intermittently

2017-04-28 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989401#comment-15989401
 ] 

Eric Payne commented on YARN-6536:
--

Thanks [~jlowe]
+1

> TestAMRMClient.testAMRMClientWithSaslEncryption fails intermittently
> 
>
> Key: YARN-6536
> URL: https://issues.apache.org/jira/browse/YARN-6536
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Eric Badger
>Assignee: Jason Lowe
> Attachments: YARN-6536.001.patch
>
>
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAllocation(TestAMRMClient.java:1005)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.registerAndAllocate(TestAMRMClient.java:703)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithSaslEncryption(TestAMRMClient.java:675)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-04-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981939#comment-15981939
 ] 

Eric Payne commented on YARN-5892:
--

bq. I don't understand imposing a hard limit of weight < 100/MULP.
[~jlowe], this was in response to [~leftnoteasy]'s comment 
[here|https://issues.apache.org/jira/browse/YARN-5892?focusedCommentId=15951539=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15951539].
 I would be interested to hear his thoughts regarding this.

BTW, the findbugs issues from the last pre-commit build are all in code that 
was not changed by this patch.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.apply.onto.0012.ericp.patch

Thanks [~sunilg] for the all of your work and for the updated patch.

I think we are making the USERLIMIT_FIRST code to complicated. [The original 
problem I was hoping to 
solve|https://issues.apache.org/jira/browse/YARN-2113?focusedCommentId=15951222=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15951222]
 was that no preemption would occur for a user that is #users > (100/MULP). So, 
if you have a MULP of 50%, 2 active users can be in the queue. So, if a third 
user comes in AND userlimit_first is true, no preemption should occur for the 
third user.

I'm still thinking this through, but I am attaching a patch that I think 
addresses my concern. This patch would be applied on top of 
YARN-2113.0012.patch.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.apply.onto.0012.ericp.patch, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6846) Nodemanager can fail to fully delete application local directories when applications are killed

2017-08-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1634#comment-1634
 ] 

Eric Payne commented on YARN-6846:
--

{quote}
Agreed. It's harmless to the code execution if it does not return ENOENT, but 
if it does (e.g.: this code is ported to a POSIX-compliant platform) then this 
prevents the executor from flagging an error from something being already 
deleted when trying to do a delete.
{quote}
Great. Thanks [~jlowe] for the fix and [~ebadger] for the review.

+1

I will commit this shortly.

> Nodemanager can fail to fully delete application local directories when 
> applications are killed
> ---
>
> Key: YARN-6846
> URL: https://issues.apache.org/jira/browse/YARN-6846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-6846.001.patch, YARN-6846.002.patch, 
> YARN-6846.003.patch
>
>
> When an application is killed all of the running containers are killed and 
> the app waits for the containers to complete before cleaning up.  As each 
> container completes the container directory is deleted via the 
> DeletionService.  After all containers have completed the app completes and 
> the app directory is deleted.  If the app completes quickly enough then the 
> deletion of the container and app directories can race against each other.  
> If the container deletion executor deletes a file just before the application 
> deletion executor then it can cause the application deletion executor to 
> fail, leaving the remaining entries in the application directory lingering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5349) TestWorkPreservingRMRestart#testUAMRecoveryOnRMWorkPreservingRestart fail intermittently

2017-08-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111413#comment-16111413
 ] 

Eric Payne commented on YARN-5349:
--

Thanks [~jlowe] for the fix. Patch LGTM.
+1

Failed unit tests are not related.

Will commit shortly.

> TestWorkPreservingRMRestart#testUAMRecoveryOnRMWorkPreservingRestart  fail 
> intermittently
> -
>
> Key: YARN-5349
> URL: https://issues.apache.org/jira/browse/YARN-5349
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.8.1
>Reporter: sandflee
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: YARN-5349.001.patch
>
>
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testUAMRecoveryOnRMWorkPreservingRestart(TestWorkPreservingRMRestart.java:1463)
> {noformat}
> https://builds.apache.org/job/PreCommit-YARN-Build/12250/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestWorkPreservingRMRestart/testUAMRecoveryOnRMWorkPreservingRestart/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6917) Queue path is recomputed from scratch on every allocation

2017-08-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6917:
-
Attachment: YARN-6917.001.patch

The queue path is only ever set during queue initialization or 
reinitialization. During this time, new queue objects are created and set into 
a new hierarchy. This patch sets the final queue path during initialization. 
Queues must be stopped before reconfiguring, and queues can only be made from 
leaf queues into parent queues and given children. Or, a totally new queue can 
be added to an existing parent queue.

I did not include a unit test because this is a performance enhancement and did 
not change the functionality. There are already many tests calling the 
{{getQueuePath()}} and {{reinitialize}} APIs.


> Queue path is recomputed from scratch on every allocation
> -
>
> Key: YARN-6917
> URL: https://issues.apache.org/jira/browse/YARN-6917
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.8.1
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-6917.001.patch
>
>
> As part of the discussion in YARN-6901 I noticed that we are recomputing a 
> queue's path for every allocation.  Currently getting the queue's path 
> involves calling getQueuePath on the parent then building onto that string 
> with the basename of the queue.  In turn the parent's getQueuePath method 
> does the same, so we end up spending time recomputing a string that will 
> never change until a reconfiguration.
> Ideally the queue path should be computed once during queue initialization 
> rather than on-demand.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104294#comment-16104294
 ] 

Eric Payne commented on YARN-5892:
--

Thanks [~sunilg] for all your help on this JIRA. Were you planning on 
committing the branch-2.8 patch, or were you expecting me to do that?

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch, 
> YARN-5892.branch-2.8.017.patch, YARN-5892.branch-2.8.018.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common

2017-08-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129100#comment-16129100
 ] 

Eric Payne commented on YARN-2127:
--

[~ste...@apache.org], thanks for identifying and documenting this improvement.

I see that YARN-679 added 
{{o.a.h.service.launcher.HadoopUncaughtExceptionHandler}}. Is the goal to make 
{{YarnUncaughtExceptionHandler}} a subclass of that?

> Move YarnUncaughtExceptionHandler into Hadoop common
> 
>
> Key: YARN-2127
> URL: https://issues.apache.org/jira/browse/YARN-2127
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Create a superclass of {{YarnUncaughtExceptionHandler}}  in the hadoop-common 
> code (retaining the original for compatibility).
> This would be available for any hadoop application to use, and the YARN-679 
> launcher could automatically set up the handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception/

2017-08-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7051:
-
Component/s: yarn
 scheduler preemption
 capacity scheduler

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception/
> -
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Priority: Critical
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception/

2017-08-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133631#comment-16133631
 ] 

Eric Payne commented on YARN-7051:
--

In the cross-queue preemption policy, it is synchronizing on the leafqueue when 
it walks the applications list:
{code:title=FifoCandidatesSelector#selectCandidates}
...
  synchronized (leafQueue) {
...
Iterator desc =
leafQueue.getOrderingPolicy().getPreemptionIterator();
while (desc.hasNext()) {
  FiCaSchedulerApp fc = desc.next();
{code}

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception/
> -
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Priority: Critical
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception/

2017-08-18 Thread Eric Payne (JIRA)
Eric Payne created YARN-7051:


 Summary: FifoIntraQueuePreemptionPlugin can get concurrent 
modification exception/
 Key: YARN-7051
 URL: https://issues.apache.org/jira/browse/YARN-7051
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3, 2.8.1, 2.9.0
Reporter: Eric Payne
Priority: Critical


{{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
following code:
{code}
Collection runningApps = leafQueue.getApplications();
Resource amUsed = Resources.createResource(0, 0);

for (FiCaSchedulerApp app : runningApps) {
{code}
{{runningApps}} is unmodifiable but not concurrent. This caused the preemption 
monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7052) RM SchedulingMonitor should use HadoopExecutors when creating ScheduledExecutorService

2017-08-18 Thread Eric Payne (JIRA)
Eric Payne created YARN-7052:


 Summary: RM SchedulingMonitor should use HadoopExecutors when 
creating ScheduledExecutorService
 Key: YARN-7052
 URL: https://issues.apache.org/jira/browse/YARN-7052
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Eric Payne


In YARN-7051, we ran into a case where the preemption monitor thread hung with 
no indication of why. This was because the preemption monitor is started by the 
{{SchedulingExecutorService}} from {{SchedulingMonigor#serviceStart}}, and then 
nothing ever gets the result of the future or allows it to throw an exception 
if needed.

At least with {{HadoopExecutor}}, it will provide a 
{{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception

2017-08-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-7051:


Assignee: Eric Payne

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception
> 
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception

2017-08-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7051:
-
Attachment: YARN-7051.001.patch

The YARN-7051.001.patch adds a leafqueue synchronization around the vulnerable 
code. I am still doing manual testing.

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception
> 
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: YARN-7051.001.patch
>
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7052) RM SchedulingMonitor should use HadoopExecutors when creating ScheduledExecutorService

2017-08-22 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-7052:


Assignee: Eric Payne

> RM SchedulingMonitor should use HadoopExecutors when creating 
> ScheduledExecutorService
> --
>
> Key: YARN-7052
> URL: https://issues.apache.org/jira/browse/YARN-7052
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> In YARN-7051, we ran into a case where the preemption monitor thread hung 
> with no indication of why. This was because the preemption monitor is started 
> by the {{SchedulingExecutorService}} from {{SchedulingMonitor#serviceStart}}, 
> and then nothing ever gets the result of the future or allows it to throw an 
> exception if needed.
> At least with {{HadoopExecutor}}, it will provide a 
> {{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7052) RM SchedulingMonitor should use HadoopExecutors when creating ScheduledExecutorService

2017-08-22 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7052:
-
Description: 
In YARN-7051, we ran into a case where the preemption monitor thread hung with 
no indication of why. This was because the preemption monitor is started by the 
{{SchedulingExecutorService}} from {{SchedulingMonitor#serviceStart}}, and then 
nothing ever gets the result of the future or allows it to throw an exception 
if needed.

At least with {{HadoopExecutor}}, it will provide a 
{{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.

  was:
In YARN-7051, we ran into a case where the preemption monitor thread hung with 
no indication of why. This was because the preemption monitor is started by the 
{{SchedulingExecutorService}} from {{SchedulingMonigor#serviceStart}}, and then 
nothing ever gets the result of the future or allows it to throw an exception 
if needed.

At least with {{HadoopExecutor}}, it will provide a 
{{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.


> RM SchedulingMonitor should use HadoopExecutors when creating 
> ScheduledExecutorService
> --
>
> Key: YARN-7052
> URL: https://issues.apache.org/jira/browse/YARN-7052
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Eric Payne
>
> In YARN-7051, we ran into a case where the preemption monitor thread hung 
> with no indication of why. This was because the preemption monitor is started 
> by the {{SchedulingExecutorService}} from {{SchedulingMonitor#serviceStart}}, 
> and then nothing ever gets the result of the future or allows it to throw an 
> exception if needed.
> At least with {{HadoopExecutor}}, it will provide a 
> {{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common

2017-08-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137431#comment-16137431
 ] 

Eric Payne commented on YARN-2127:
--

The unit tests are not failing for me in my local environment.

[~ste...@apache.org], will you be able to review the patch?

> Move YarnUncaughtExceptionHandler into Hadoop common
> 
>
> Key: YARN-2127
> URL: https://issues.apache.org/jira/browse/YARN-2127
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-2127.001.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Create a superclass of {{YarnUncaughtExceptionHandler}}  in the hadoop-common 
> code (retaining the original for compatibility).
> This would be available for any hadoop application to use, and the YARN-679 
> launcher could automatically set up the handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common

2017-08-17 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2127:
-
Attachment: YARN-2127.001.patch

Thanks [~ste...@apache.org]. I'm attaching a patch that subclasses 
{{YarnUncaughtExceptionHandler}} from {{HadoopUncaughtExceptionHandler}}. The 
next steps will be to have HDFS subclass it as well and add the handler in the 
HDFS daemons. However, after discussing with some of the HDFS crew here, it is 
a little complicated because there are cases where subthreads of the daemons 
could die and the daemon doesn't care, and there are cases where it depends on 
this behavior. There are discussions about this in HDFS-11192, HDFS-2911, and 
HDFS-313

> Move YarnUncaughtExceptionHandler into Hadoop common
> 
>
> Key: YARN-2127
> URL: https://issues.apache.org/jira/browse/YARN-2127
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-2127.001.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Create a superclass of {{YarnUncaughtExceptionHandler}}  in the hadoop-common 
> code (retaining the original for compatibility).
> This would be available for any hadoop application to use, and the YARN-679 
> launcher could automatically set up the handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common

2017-08-17 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-2127:


Assignee: Eric Payne

> Move YarnUncaughtExceptionHandler into Hadoop common
> 
>
> Key: YARN-2127
> URL: https://issues.apache.org/jira/browse/YARN-2127
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Eric Payne
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Create a superclass of {{YarnUncaughtExceptionHandler}}  in the hadoop-common 
> code (retaining the original for compatibility).
> This would be available for any hadoop application to use, and the YARN-679 
> launcher could automatically set up the handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995184#comment-15995184
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg], would you mind, please, explaining why the check for 
{{usedMinusOneContainer > userLimit}} is necessary? If this code will skip if 
{{(used - currentContainer) > userLimit}}, why is it also necessary to check 
{{usedMinusOneContainer > userLimit}}? Won't that check happen when the 
youngest container _is_ the current container?
{code:title=FifoIntraQueuePreemptionPlugin#skipContainerBasedOnIntraQueuePolicy}
return Resources.lessThanOrEqual(rc, clusterResource,
Resources.subtract(usedResource, c.getAllocatedResource()), userLimit)
&& Resources.greaterThan(rc, clusterResource, usedMinusOneContainer,
userLimit)
&& context.getIntraQueuePreemptionOrder()
.equals(IntraQueuePreemptionOrder.USERLIMIT_FIRST);
  }
{code}

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-05-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.013.patch

Attaching patch YARN-5892.013.patch. Previous patch had a rounding error.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-05-11 Thread Eric Payne (JIRA)
Eric Payne created YARN-6585:


 Summary: RM fails to start when upgrading from 2.7 to 2.8 for 
clusters with node labels.
 Key: YARN-6585
 URL: https://issues.apache.org/jira/browse/YARN-6585
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Payne


{noformat}
Caused by: java.io.IOException: Not all labels being replaced contained by 
known label collections, please check, new labels=[abc]
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
at 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
at 
org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
at 
org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 13 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-05-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6585:
-
Target Version/s: 2.8.1
Priority: Blocker  (was: Major)

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Priority: Blocker
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003254#comment-16003254
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg] / [~leftnoteasy]. It looks like the latest patch has all of the 
features we agreed on. Thanks [~sunilg] for all of your hard work on this 
feature and [~leftnoteasy] for all of your reviews and valuable feedback.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, 
> YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-15 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011304#comment-16011304
 ] 

Eric Payne commented on YARN-2113:
--

I'm sorry [~sunilg], but the new patch doesn't fix the problem. I can still 
reproduce it with patch 0017.

Also, the new test 
{{TestProportionalCapacityPreemptionPolicyIntraQueueWithDRF}} doesn't actually 
test the problem. I applied patch 0016 to trunk and copied 
{{TestProportionalCapacityPreemptionPolicyIntraQueueWithDRF.java}} to 
{{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyIntraQueueWithDRF.java}},
 and the test still succeeded.

The failure depends on _both_ of the following:
# using {{DominantResourceCalculator}}
# setting {{yarn.nodemanager.resource.cpu-vcores}} to something other than 10.


> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, 
> YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-15 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011384#comment-16011384
 ] 

Eric Payne commented on YARN-2113:
--

[~sunilg], the unit test was not failing for a couple of reasons. First, it was 
still using the {{DefaultResourceCalculator}} even though the 
{{TestProportionalCapacityPreemptionPolicyIntraQueueWithDRF#setup}} was setting 
{{rc}} to {{DominantResourceCalculator}}.

First, I think {{when(cs.getResourceCalculator()).thenReturn(rc);}} should be 
added
{code:title=TestProportionalCapacityPreemptionPolicyIntraQueueWithDRF#setup}
  public void setup() {
super.setup();
conf.setBoolean(
CapacitySchedulerConfiguration.INTRAQUEUE_PREEMPTION_ENABLED, true);
rc = new DominantResourceCalculator();
   when(cs.getResourceCalculator()).thenReturn(rc);
policy = new ProportionalCapacityPreemptionPolicy(rmContext, cs, mClock);
  }
{code}

Second, in each of the tests, {{String labelsConfig = "=100:50,true;";}} should 
be changed to {{String labelsConfig = "=100:200,true;";}}

By making these changes, the tests will fail. If I comment out the 
{{when...thenReturn}}, the test succeeds again because it goes back to using 
the {{DefaultResourceCalculator}}

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, 
> YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008402#comment-16008402
 ] 

Eric Payne commented on YARN-2113:
--

Hey [~sunilg], I'm still digging into this, but this happens for me only if I'm 
using the {{DominantResourceCalculator}}, not if I use the 
{{DefaultResourceCalculator}}. I'm wondering now if this is something strange 
in my environment.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008658#comment-16008658
 ] 

Eric Payne commented on YARN-2113:
--

The oscillation is not related to the number of node managers. Rather, it 
occurs if 1) the cluster is using the {{DominantResourceCalculator}} and 2) the 
number of {{yarn.nodemanager.resource.cpu-vcores}} is not 10.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-11 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007294#comment-16007294
 ] 

Eric Payne commented on YARN-2113:
--

Hi [~sunilg]. There's something strange about the way preemption works when 
PRIORITY_FIRST is set.

If I have 2 or more node managers in my cluster, a large amount of oscillation 
occurs. The number of preempted containers keeps increasing and never stops. 
This does not occur when USERLIMIT_FIRST is set. Before today, I had not tested 
any of these patches with 2 or more node managers, so I wondered if this was a 
recent development. I recompiled with 0013 and 0014, and they have the same 
behavior, so this is not unique to 0016.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045063#comment-16045063
 ] 

Eric Payne commented on YARN-6585:
--

BTW, the findbugs warning for 
{{org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat}} is is not 
relevant for this patch.

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch, 
> YARN-6585.0003.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045017#comment-16045017
 ] 

Eric Payne commented on YARN-6585:
--

Thanks [~sunilg]. The patch looks good.
+1. Will commit soon.

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch, 
> YARN-6585.0003.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-06-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062167#comment-16062167
 ] 

Eric Payne commented on YARN-5892:
--

Thanks for everything, [~sunilg]. 
bq. Committed to trunk. However it is not getting applied to branch-2. Eric 
Payne, could you please help to get branch-2 patch.
I have a backport for branch-2 for this JIRA. However, it depends on the 
backport patches for branch-2 from YARN-2113. Can you please take a look at the 
backport patches that I uploaded for YARN-2113?

> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-20 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.0019.patch

Hi [~sunilg] and [~leftnoteasy]. I am also attaching the branch-2 patch. It is 
slightly different than the branch-2.8 patch. Mostly in the area of locking in 
LeafQueue.

The backports for YARN-5892 depends on these backports I have uploaded here for 
YARN-2113, so if you could please take a look at these branch-2 and branch-2.8 
backports for YARN-2113, I would really appreciate it.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.8.0019.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-05-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021209#comment-16021209
 ] 

Eric Payne commented on YARN-5892:
--

[~jlowe], [~leftnoteasy], [~sunilg], just checking to see if you have had any 
progress reviewing this patch. Thanks.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023111#comment-16023111
 ] 

Eric Payne commented on YARN-2113:
--

bq. I might need to backport YARN-5889 first and then this ticket. Thoughts?
Yes, YARN-2113 is coupled with YARN-5889, so YARN-5889 should also be 
backported to branch-2.

However, it gets complicated if we were to try to backport YARN-5889 to 
branch-2.8. I tried to do that, but I ran into some deadlocks in the unit 
tests. I believe that YARN-5889 will have other dependencies such as YARN-3091, 
YARN-3140, etc.

Rather than backporting YARN-5889 and all of its prereqs to 2.8, an alternative 
would be to separate out from YARN-5889 only the parts needed to support 
YARN-2113. I think the meat of it is in:
{code:title=LeafQueue#computeUserLimit}
 int usersCount = getNumActiveUsers();
 Resource resourceUsed = totalResUsageForActiveUsers.getUsed(nodePartition);

 // For non-activeUser calculation, consider all users count.
 if (!activeUser) {
   resourceUsed = currentCapacity;
   usersCount = users.size();
 }
{code}

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026253#comment-16026253
 ] 

Eric Payne commented on YARN-2113:
--

I don't think the docker failure was due to the branch-2.8 patch:
{code}
Connecting to download.oracle.com (download.oracle.com)|184.25.56.53|:80... 
connected.
HTTP request sent, awaiting response... 404 Not Found
2017-05-25 22:24:32 ERROR 404: Not Found.

download failed
Oracle JDK 7 is NOT installed.
dpkg: error processing package oracle-java7-installer (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 oracle-java7-installer
E: Sub-process /usr/bin/dpkg returned an error code (1)
The command '/bin/sh -c apt-get -q install --no-install-recommends -y 
oracle-java7-installer' returned a non-zero code: 100

Total Elapsed time:   4m 26s

ERROR: Docker failed to build image.
{code}

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-05-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025425#comment-16025425
 ] 

Eric Payne commented on YARN-5892:
--

Thank you for the reviews.
bq. allUsersTimesWeights will be less than 1. I think in this case UL value is 
higher

[~sunilg], I think this is similar to the question I answered 
[above|https://issues.apache.org/jira/browse/YARN-5892?focusedCommentId=15972782=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15972782],
 but I'll restate it here for the sake of clarity.

Having a combined sum of weights < 1 does not cause UL to be too large. This is 
because {{userLimitResource}} (the return value of {{computeUserLimit}}) is 
only ever used by {{getComputedResourceLimitFor\[Active|All\]Users}}, which 
then multiplies the value of {{userLimitResource}} by the appropriate user's 
weight before returning it. This will result in the correct value of userLimit 
for each specific user. When the sum of active user(s)'s weight(s) is < 1, then 
it is true that {{userLimitResource}} is larger than the actual number of 
resources used. However, {{userLimitResource}} is just an intermediate value.

bq. In UserManager, do we also need to lock while updating 
"activeUsersTimesWeights". 
Can you please clarify where you see it being read or written outside the lock? 
I think the code is within locks everwhere it is used.


bq. 1) Could you move CapacitySchedulerQueueManager#updateUserWeights to 
LeafQueue#setupQueueConfigs.

[~leftnoteasy], good optimization. I will make this change, do testing and 
await [~sunilg]'s response before submitting a new patch.


> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-25 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.8.0019.patch

[~sunilg], I took a shot at backporting YARN-2113.0019.patch to branch-2.8, 
decoupling it from YARN-5889. I am still running through the tests, but this 
seems to work fairly well so far.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6618) TestNMLeveldbStateStoreService#testCompactionCycle can fail if compaction occurs more than once

2017-05-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017604#comment-16017604
 ] 

Eric Payne commented on YARN-6618:
--

LGTM. +1. Thanks [~jlowe].

> TestNMLeveldbStateStoreService#testCompactionCycle can fail if compaction 
> occurs more than once
> ---
>
> Key: YARN-6618
> URL: https://issues.apache.org/jira/browse/YARN-6618
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: YARN-6618.001.patch
>
>
> The testCompactionCycle unit test is verifying that the compaction cycle 
> occurs after startup, but rarely the compaction cycle can occur more than 
> once which fails the test.  The unit test needs to account for this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-05-30 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.014.patch

Uploading YARN-5892.014.patch.

bq. 1) Ideally we allow atleast one container (extra) for a user given UL is 
lesser. So it might be fine to use multiplyAndNormalizeDown 
bq. I feel we do not need to award a user with 2 containers, correct.?
[~sunilg], thanks. That makes sense. This patch uses 
{{multiplyAndNormalizeDown}}
bq. 1) Could you move CapacitySchedulerQueueManager#updateUserWeights to 
LeafQueue#setupQueueConfigs.
[~leftnoteasy], thanks. This patch contains that change.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-05-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031352#comment-16031352
 ] 

Eric Payne commented on YARN-5892:
--

The findbugs warning was in code that was not changed by this patch. The 
javadoc warnings are unavoidable due to the naming conventions of the rendering 
code: {{'_' used as an identifier}}.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035055#comment-16035055
 ] 

Eric Payne commented on YARN-6585:
--

bq. I don't understand why we needed to add a deprecated newInstance method
I'm also confused about the need for {{void setNodeLabelsList(Set 
labels)}} and {{Set getNodeLabelsList()}} in 
{{ddToClusterNodeLabelsRequest\[PBImpl\]}}. AFAICT, the code added to 
{{initLocalNodeLabels}} should be sufficient:
{code:title=AddToClusterNodeLabelsRequestPBImp#initLocalNodeLabelsl}
if (this.updatedNodeLabels.isEmpty()) {
  List deprecatedLabelsList = p.getDeprecatedNodeLabelsList();
  for (String l : deprecatedLabelsList) {
this.updatedNodeLabels.add(NodeLabel.newInstance(l));
  }
}
{code}

Also, I don't think the unit test is actually testing the above code. I took 
out the above lines, ran the test, and it still succeeded.

In fact, I think this will be difficult to test, since 
{{AddToClusterNodeLabelsRequestPBImp#initLocalNodeLabelsl}} is called by 
{{FileSystemNodeLabelsStore#loadFromMirror}}, which is reading the 
nodelabel.mirror and nodelabel.editlog files from HDFS, and the current test 
doesn't seem to be mocking any part of that. I'm still thinking about this one 
;-/

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-02 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: (was: YARN-2113.branch-2.8.0019.patch)

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 
> Intra-QueuePreemption Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-02 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2113:
-
Attachment: YARN-2113.branch-2.8.0019.patch

Bad 2.8 patch. Sorry about that. Uploading a fixed on that actually compiles.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035315#comment-16035315
 ] 

Eric Payne commented on YARN-6585:
--

bq. In similar to GetClusterNodeLabelsResponse, it was also support labels as 
string earlier
If I understand correctly, I think you are saying the following:
# since {{GetClusterNodeLabelsResponse}} has similar APIs, then 
{{AddToClusterNodeLabelsRequest}} should also have them
# YARN-3413 removed the APIs for {{newInstance(Set labels)}}, 
{{setNodeLabels(Set labels)}}, and {{Set getNodeLabels()}} and 
replaced them with APIs for {{newInstance(List nodeLabels)}}, {{void 
setNodeLabels(List nodeLabels)}}, and {{List 
getNodeLabels()}}. It is incompatible to change APIs between minor releases, 
and this should be corrected.

Is that correct? While both of those are valid points, I feel that these issues 
should be fixed as part of a separate JIRA since they are confusing things here 
and are not related to fixing this problem.

Also, if the concern is compatability with 2.7, the APIs in 2.7 that operated 
on {{Set}} were named {{\[get|set\]NodeLabels}} rather than 
{{\[get|set\]NodeLabelsList}}

bq. i was making sure that we have stored labels as string (like in 2.7 
version), then read it back to ensure labels are still loaded an readable from 
RM
I still need to think more about the unit test, but the deprecated API 
{{AddToClusterNodeLabelsRequest newInstance(Set nodeLabels)}} doesn't 
write the labels in the 2.7 format. It just turns around and calls the 2.8 API.

Also, as I said above, I don't think the code in 
{{AddToClusterNodeLabelsRequestPBImpl#initLocalNodeLabels}} is getting 
exercised by the unit test. I took out the code and the test still passes.



> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033720#comment-16033720
 ] 

Eric Payne commented on YARN-6585:
--

bq. i also had to a newInstance method in addToClusterNodeLabelsRequest to 
accept labels as string.
I'm sorry, maybe I'm missing something [~sunilg], but I don't understand why we 
needed to add a deprecated {{newInstance}} method that is only called by the 
new unit test.

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6585:
-
Target Version/s: 2.8.2  (was: 2.8.1)

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033429#comment-16033429
 ] 

Eric Payne commented on YARN-2113:
--

Hmm. Still getting the 404:
{noformat}
[0m[91mConnecting to download.oracle.com 
(download.oracle.com)|23.61.194.27|:80... [0m[91mconnected.
HTTP request sent, awaiting response... [0m[91m404 Not Found
[0m[91m2017-06-01 17:35:54 ERROR 404: Not Found.
{noformat}
I wonder what's up with the {{[0m[91m}} everywhere.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-06-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.015.patch

[~sunilg] / [~leftnoteasy] / [~jlowe], I'm loading a new patch. I forgot to 
make the NormalizeDown modifications to the calculations for all users. In the 
previous patch, it was just for the active users' calculations.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991126#comment-15991126
 ] 

Eric Payne commented on YARN-2113:
--

Thanks [~leftnoteasy].

{quote}
1) Priority first:
- App1 can preempt App2 if:
-- When App1.priority > App2.priority
--- User of app1 not reached user-limit
{quote}
I believe that the above statement is too broad. The above statement sounds 
like preemption will bring App2.user down below its user limit even when other 
users are above theirs. I would like to ensure that App2.user is not brought 
below its user limit unless all other users are already at or below their user 
limit.

{quote}
2) User limit first:
- App1 can preempt app2 if:
-- app1.user != app2.user
{quote}
Even in USERLIMIT_FIRST mode, shouldn't App1 preempt from App2 if App1.priority 
> App2.priority AND App1.user == App2.user?

bq. No matter if app1.user is outside of #100/MULP or not, no change to 
preemption behavior
You are right.

{quote}
If I understand Jason's comment correctly: above behavior should support both 
scenarios via different policies:
{quote}

I would like to hear [~jlowe]'s comments, but IIUC, he would have the same 
concern as above. I would also like his analysis of the following.

I see preemption requirements as follows (and maybe from the other direction):
For both USERLIMIT_FIRST and PRIORITY_FIRST:
- preemption for App1 should happen while App1.user is below its user limit AND 
in the following order of importance:
-- Preempt from apps owned by over-user-limit users:
--- Preempt lowest priority containers in youngest-to-oldest order of 
over-user-limit users (maybe even put containers first from 
highest-over-user-limit users).
-- Preempt from lower-priority apps owned by App1.user
--- When there are no more users over their user limit AND lower priority apps 
owned by App1.user exist, preempt lowest-priority containers in 
youngest-to-oldest order

For PRIORITY_FIRST
- Continue to preempt lower priority non-AM containers in lowest-to-highest and 
youngest-to-oldest order until App1.user has reached its user limit.

For USERLIMIT_FIRST
- no further preemption occurs

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-05-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998847#comment-15998847
 ] 

Eric Payne commented on YARN-5892:
--

Sorry [~sunilg]. I missed your comment.

{quote}
I have few doubts related to YARN-2113 and this patch together.
   1. Give user-weight is 0, we can get user-limit for certain users as 0. 
Hence intra-queue preemption module could preempt all containers of user with 0 
weight leaving its AM containers behind.
{quote}
If a user starts out with a weight of 0, their app will only get the AM and 
then never be given any more containers because the weight is also multiplied 
by the user limit factor. If a user has a weight of >0 and the user's apps have 
running containers,  and then a cluster admin "pauses" the user on a 
preemptable queue, then all non-AM containers will be preempted. I think this 
is consistent behavior.
{quote}
   2. Similar behavior could occur if there are few users in the cluster whose 
weights are below 1 (and together the sum of weights is less than 1). 
{quote}
The fact that a user's weight is < 1.0 (or even the sum of multiple users' 
weights < 1.0) won't cause a problem. If user1 has a weight of 0.5 and its app 
is consuming the whole queue, and user2 with weight 2.0 submits an app, 
preemption occurs until user2's app has 4 times the resources of user1. It's 
the same as if user1 had weight of 1 and user2 had weight of 4.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991671#comment-15991671
 ] 

Eric Payne commented on YARN-2113:
--

{quote}
bq. 2) User limit first:
bq. - App1 can preempt app2 if: ◾app1.user != app2.user

Even in USERLIMIT_FIRST mode, shouldn't App1 preempt from App2 if App1.priority 
> App2.priority AND App1.user == App2.user?
{quote}
In an offline conversation with [~jlowe], we concluded that no, App1 would not 
preempt from App2 in this case.

bq. In patch 11/12, we tried to identify container which could bring user's 
usage under UL.
Given that when USERLIMIT_FIRST is set a user won't preempt from itself, I 
believe that patch 12 fits most closely with my vision of intra-queue 
preemption.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-05-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018125#comment-16018125
 ] 

Eric Payne commented on YARN-2113:
--

The latest patch is better, but I can still cause oscillation when using 
DominantResourceCalculator under some circumstances.

Also, it oscillates if the AM container is a different size than the map/reduce 
containers. This oscillation was not introduced by this JIRA. It was there 
since YARN-2009.

Both oscillation issues only happen when PRIORITY_FIRST is set. Since this 
doesn't happen with the USERLIMIT_FIRST mode, I am wiling to address the 
oscillation issues as part of a different JIRA.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.apply.onto.0012.ericp.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037860#comment-16037860
 ] 

Eric Payne commented on YARN-6585:
--

Hi [~sunilg],

I have come up with a unit test for this that I think will work well. It 
creates a 2.7-formatted levelDB file and starts the 2.8 RM while referencing 
it. I have tested it before and after the fix. Before the fix, it throws a 
similar exception as was experienced when we discovered this error:
{noformat}
java.io.IOException: Not all labels being replaced contained by known label 
collections, please check, new labels=[a]
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
at 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
...
{noformat}

{code:title=TestRMNodeLabelsManager#testBackwardsCompatableMirror}
  @Test(timeout = 6)
  public void testBackwardsCompatableMirror() throws Exception {
lmgr = new RMNodeLabelsManager();
Configuration conf = new Configuration();
File tempDir = File.createTempFile("nlb", ".tmp");
tempDir.delete();
tempDir.mkdirs();
tempDir.deleteOnExit();
String tempDirName = tempDir.getAbsolutePath();
conf.set(YarnConfiguration.FS_NODE_LABELS_STORE_ROOT_DIR, tempDirName);

// The following are the contents of a 2.7-formatted levelDB file to be
// placed in nodelabel.mirror. There are 3 labels: 'a', 'b', and 'c'.
// host1 is labeled with 'a', host2 is labeled with 'b', and c is not
// associated with a node.
byte[] contents =
  {
  0x09, 0x0A, 0x01, 0x61, 0x0A, 0x01, 0x62, 0x0A, 0x01, 0x63, 0x20, 
  0x0A, 0x0E, 0x0A, 0x09, 0x0A, 0x05, 0x68, 0x6F, 0x73, 0x74, 0x32, 
  0x10, 0x00, 0x12, 0x01, 0x62, 0x0A, 0x0E, 0x0A, 0x09, 0x0A, 0x05, 
  0x68, 0x6F, 0x73, 0x74, 0x31, 0x10, 0x00, 0x12, 0x01, 0x61
  };
File file = new File(tempDirName + "/nodelabel.mirror");
file.createNewFile();
FileOutputStream stream = new FileOutputStream(file);
stream.write(contents);
stream.close();

conf.setBoolean(YarnConfiguration.NODE_LABELS_ENABLED, true);
conf.set(YarnConfiguration.RM_SCHEDULER,

"org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler");
Configuration withQueueLabels = getConfigurationWithQueueLabels(conf);

MockRM rm = initRM(withQueueLabels);
Set labelNames = lmgr.getClusterNodeLabelNames();
Map labeledNodes = lmgr.getLabelsToNodes();

Assert.assertTrue(labelNames.contains("a"));
Assert.assertTrue(labelNames.contains("b"));
Assert.assertTrue(labelNames.contains("c"));
Assert.assertTrue(labeledNodes.get("a")
.contains(NodeId.newInstance("host1", 0)));
Assert.assertTrue(labeledNodes.get("b")
.contains(NodeId.newInstance("host2", 0)));

rm.stop();
  }
{code}


> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, 

[jira] [Updated] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7149:
-
Attachment: YARN-7149.002.patch

bq. Do you think does it make sense to merge the 
{{YARN-7149.demo.unit-test.patch}} to your patch?
Thanks [~leftnoteasy]. I spent some time looking through the test patch to make 
sure I understand its purpose. I think it makes sense to merge it with this 
change. Attaching an updated patch.

> Cross-queue preemption sometimes starves an underserved queue
> -
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7149.001.patch, YARN-7149.002.patch, 
> YARN-7149.demo.unit-test.patch
>
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170157#comment-16170157
 ] 

Eric Payne commented on YARN-7149:
--

Thanks a lot [~leftnoteasy].

Also, this needs to be pulled back into branch-2. I will do that if there are 
no objections.

> Cross-queue preemption sometimes starves an underserved queue
> -
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-7149.001.patch, YARN-7149.002.patch, 
> YARN-7149.demo.unit-test.patch
>
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7149:
-
Fix Version/s: 3.1.0
   2.9.0

> Cross-queue preemption sometimes starves an underserved queue
> -
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-7149.001.patch, YARN-7149.002.patch, 
> YARN-7149.demo.unit-test.patch
>
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7245) In Cap Sched UI, Max AM Resource column in Active Users Info section should be per-user

2017-09-22 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7245:
-
Attachment: CapSched UI Showing Inaccurate Per User Max AM Resource.png

The value in the {{Max AM Resource}} column in the {{Active Users Info}} 
section of the Capacity scheduler UI contains the value for {{Max Application 
Master Resources}}, which is the max for the whole queue. It should be the 
{{Max Application Master Resources Per User}} value, which is the max AM 
resources that a single user can use.

See the attached screenshot.

> In Cap Sched UI, Max AM Resource column in Active Users Info section should 
> be per-user
> ---
>
> Key: YARN-7245
> URL: https://issues.apache.org/jira/browse/YARN-7245
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha4
>Reporter: Eric Payne
> Attachments: CapSched UI Showing Inaccurate Per User Max AM 
> Resource.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7245) In Cap Sched UI, Max AM Resource column in Active Users Info section should be per-user

2017-09-22 Thread Eric Payne (JIRA)
Eric Payne created YARN-7245:


 Summary: In Cap Sched UI, Max AM Resource column in Active Users 
Info section should be per-user
 Key: YARN-7245
 URL: https://issues.apache.org/jira/browse/YARN-7245
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 3.0.0-alpha4, 2.8.1, 2.9.0
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-15 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168499#comment-16168499
 ] 

Eric Payne commented on YARN-7149:
--

The following unit tests are succeeding for me in my environment:
{code}
TestOpportunisticContainerAllocatorAMService
TestZKRMStateStore
TestSubmitApplicationWithRMHA 
{code}

{{TestContainerAllocation}} was modified by this patch, and the new test is 
succeeding. The failure in 
{{TestContainerAllocation#testAMContainerAllocationWhenDNSUnavailable}} is a 
pre-existing issue: YARN-7044.

> Cross-queue preemption sometimes starves an underserved queue
> -
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7149.001.patch, YARN-7149.002.patch, 
> YARN-7149.demo.unit-test.patch
>
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166286#comment-16166286
 ] 

Eric Payne commented on YARN-7149:
--

Unit test failures are not related to this patch:
{{TestAbstractYarnScheduler}}: Succeeds for me locally
{{TestContainerAllocation}}: YARN-7044

> Cross-queue preemption sometimes starves an underserved queue
> -
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7149.001.patch, YARN-7149.demo.unit-test.patch
>
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4727) Unable to override the $HADOOP_CONF_DIR env variable for container

2017-09-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165194#comment-16165194
 ] 

Eric Payne commented on YARN-4727:
--

+1 Thanks [~jlowe]

> Unable to override the $HADOOP_CONF_DIR env variable for container
> --
>
> Key: YARN-4727
> URL: https://issues.apache.org/jira/browse/YARN-4727
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1, 2.5.2, 2.7.2, 2.6.4, 2.8.1
>Reporter: Terence Yim
>Assignee: Jason Lowe
> Attachments: YARN-4727.001.patch, YARN-4727.002.patch
>
>
> Given the default config of "yarn.nodemanager.env-whitelist", application 
> should be able to set the env variable $HADOOP_CONF_DIR to value other than 
> the one in the NodeManager system environment. However, I believe due to a 
> bug in the 
> {{org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch}}
>  class, it is not possible so.
> From the {{sanitizeEnv()}} method in the ContainerLaunch class 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L977)
> {noformat}
> putEnvIfNotNull(environment, 
> Environment.HADOOP_CONF_DIR.name(), 
> System.getenv(Environment.HADOOP_CONF_DIR.name())
> );
> if (!Shell.WINDOWS) {
>   environment.put("JVM_PID", "$$");
> }
> String[] whitelist = conf.get(YarnConfiguration.NM_ENV_WHITELIST, 
> YarnConfiguration.DEFAULT_NM_ENV_WHITELIST).split(",");
> 
> for(String whitelistEnvVariable : whitelist) {
>   putEnvIfAbsent(environment, whitelistEnvVariable.trim());
> }
> ...
> private static void putEnvIfAbsent(
> Map environment, String variable) {
>   if (environment.get(variable) == null) {
> putEnvIfNotNull(environment, variable, System.getenv(variable));
>   }
> }
> {noformat}
> So there two issues here.
> 1. the environment is already set with the system environment of the NM in 
> the {{putEnvIfNotNull}} call, hence the {{putEnvIfAbsent}} call will never 
> set it to some new value
> 2. Inside the {{putEnvIfAbsent}} call, it uses the system environment of the 
> NM, which it should be using the one from the {{launchContext}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7149:
-
Attachment: YARN-7149.001.patch

Rather than use this JIRA to revert the {{computeUserLimit}} behavior to 
pre-YARN-5889, patch {{YARN-7149.001.patch}} just adds {{minimumAllocation (min 
container size)}} to {{resourceUsed}}. I see this as a compromise between the 
old and the new behavior. Please let me know your thoughts.

> Cross-queue preemption sometimes starves an underserved queue
> -
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7149.001.patch, YARN-7149.demo.unit-test.patch
>
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   10   >