from:"Eric Payne \\\(Jira\\\)"

[jira] [Commented] (YARN-4459) container-executor might kill process wrongly

2016-05-25 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300266#comment-15300266
 ] 

Eric Payne commented on YARN-4459:
--

Thanks [~hex108] for the fix and [~jlowe] for the update and reveiw.
Patch LGTM.
+1

> container-executor might kill process wrongly
> -
>
> Key: YARN-4459
> URL: https://issues.apache.org/jira/browse/YARN-4459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4459.01.patch, YARN-4459.02.patch, 
> YARN-4459.03.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first 
> checks whether process group exists, if not, it will kill the process 
> itself(if it the process exists).  It is not reasonable because that the 
> process group does not exist means corresponding container has finished, if 
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for 
> starting NM and submitted app, and container-executor sometimes killed NM(the 
> wrongly killed process might just be a newly started thread and was NM's 
> child process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-07-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389828#comment-15389828
 ] 

Eric Payne commented on YARN-4945:
--

HI [~leftnoteasy]. Sorry for the long delay. I am now starting to have more 
time to focus on putting a design together and getting a POC working for 
in-queue preemption.

{quote}
- When intra-queue preemption can happen: in some cases, we need intra-queue 
preemption happen when queue is under its guaranteed resource,
{quote}
Just for clarification, I think that if resources are available in the cluster 
and a queue can get more resources by growing the queue's usage, then in-queue 
preemption shouldn't happen. However, if something in the queue's hieararchy 
has reached its absonlute max capacity or if the the cluster itself is full, 
then in-queue preemption should happen, even if the queue is under its 
guaranteed resource max.

For {{queue X}}, this should happen when all of the following occur:
# some set of resources (memory, vcores, labelled, locality, etc) are all used, 
either by other queues or apps in {{queue X}}
# any user in {{queue X}} is over its minimum user limit percent
# another user in {{queue X}} is under its minimum user limit percent and 
asking for resources

Having said that, the question of whether a queue can grow its usage by 
allocating available resources is complicated by the same issues that plague 
cross-queue preemption such as labelled resources, locality, fragmented memory, 
and so forth.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-07-25 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392184#comment-15392184
 ] 

Eric Payne commented on YARN-4945:
--

Thanks [~sunilg]. I appreciate very much any design help and POC help that you 
can provide.

{quote}
queue preemption may need to consider multiple factors such
- Priority
- Fairness
- User limit
{quote}

In general, I would like to separate out the sub-feature pieces of in-queue 
preemption as much as possible. I am a fan of making small, simple improvements 
in increments. I believe this makes it easier to understand, test, and review.

bq. For initial POC, I was planning priority as I have done an independent POC 
for priority preemption alone.
One thing I don't understand is the use case for a priority policy that is 
separate from a user limit policy. For my users' use case, the most important 
of these is user limit. However, I think that in-queue preemption based on user 
limit needs to be very dependent on app priority and vice-versa. Can you please 
elaborate?

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-07-25 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392303#comment-15392303
 ] 

Eric Payne commented on YARN-4945:
--

{quote}
 This is considering priority of apps alone. Yes, we have to have user-limit 
etc. As we progress, we can add that to strengthen the intra-queue preemption 
for more accurate results.
{quote}
Thanks, [~sunilg]. With the proposed design, is it possible for an app to be 
preempted that is below its user limit? I think that should not happen even if 
a higher priority app is asking for resources.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-08-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402492#comment-15402492
 ] 

Eric Payne commented on YARN-4091:
--

[~ChenGe], thank you for your work on this feature. I am sorry for the delay in 
my response.

{quote}
If running in previous patch without changes, one node heartbeat costs 0.2ms 
approximately. If we only record application activities, the difference of 
running time is unnoticeable, less than 0.01 ms. But if we record a complete 
node heartbeat activities, the running time for each node heartbeat is 0.6ms, 
which is about 3X compared to the baseline. However, in practice, only a few 
nodes' activities will be recorded at the same time. For example, if there're 
30 nodes activities being recoreded at the same time (which is already a huge 
number to me). Compared to the time cost by 2000 node heartbeats, the time to 
record activities is small (around 3% more overhead), so it is neglectable and 
acceptable.
{quote}
I would be interested to know how you gathered this information. Also, how are 
you limiting the number of nodes whose state is being logged?

I am concerned about the performance load this feature will add to the resource 
manager. I have analyzed the code and experimented with the feature on a 3-node 
cluster. It appears that the state is being recorded for every node on every 
heartbeat:
{code}
case NODE_UPDATE:
{
...
  if (!scheduleAsynchronously) {
ActivitiesLogger.NODE.startNodeUpdateRecording(activitiesManager,
node.getNodeID());
allocateContainersToNode(getNode(node.getNodeID()));
ActivitiesLogger.NODE.finishNodeUpdateRecording(activitiesManager,
node.getNodeID());
...
{code}

And, from my experimentation, 
{{ActivitiesLogger.NODE.startNodeUpdateRecording}} is always called, and it is 
almost always followed by a call to one of the 
{{ActivitiesLogger.NODE.finish*}} methods. If this is happening every 
heartbeat, I am afraid that it will put a great strain on the resource manager. 
Can you please comment?

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, 
> YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, 
> YARN-4091.5.patch, YARN-4091.5.patch, YARN-4091.preliminary.1.patch, 
> app_activities.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-08-05 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409976#comment-15409976
 ] 

Eric Payne commented on YARN-4091:
--

bq. Any other suggestions? Sunil G / Eric Payne.
Latest patch LGTM +1. Thanks [~ChenGe] and [~leftnoteasy].

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Fix For: 3.0.0-alpha2
>
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, 
> YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, 
> YARN-4091.5.patch, YARN-4091.5.patch, YARN-4091.6.patch, YARN-4091.7.patch, 
> YARN-4091.8.patch, YARN-4091.preliminary.1.patch, app_activities v2.json, 
> app_activities.json, node_activities v2.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-08-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402932#comment-15402932
 ] 

Eric Payne commented on YARN-4091:
--

Thanks, [~leftnoteasy], for the helpful explanation. I see now that the 
expensive parts of the code are done inside {{ActivitiesManager}}, and that 
those are protected by the {{shouldRecordThis*}} checks. This feature does 
still add several new calls per second to the node heartbeat and container 
allocation paths regardless. Even though these calls are all normally very 
fast, these are critical paths, and so that is why I am concerned about 
performance. It sounds like you have performed due diligence in the area 
surrounding these calls, so it will probably not have much impact (I hope).

[~ChenGe], I do have one other comment. I notice that the "{{priority}}" key in 
the output is kind of ambiguous. It may be difficult for some to differentiate 
between app priority and container priority. For example:
{code}
{
  "nodeId":"hostname.company.com:45454",
  "queueName":"default",
  "priority":"0",
...
  "allocationAttempt":
  [
{
  "priority":"0",
  "allocationState":"SKIPPED",
  "diagnostic":"priority skipped"
},
{
  "name":"container_e03_1470083952204_0001_01_000103",
  "priority":"20",
  "allocationState":"ALLOCATED"
}
  ]
},
{code}
Thanks!

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, 
> YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, 
> YARN-4091.5.patch, YARN-4091.5.patch, YARN-4091.preliminary.1.patch, 
> app_activities.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Moved] (YARN-5469) Increase timeout of TestAmFilter.testFilter

2016-08-03 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne moved HADOOP-13462 to YARN-5469:
---

Key: YARN-5469  (was: HADOOP-13462)
Project: Hadoop YARN  (was: Hadoop Common)

> Increase timeout of TestAmFilter.testFilter
> ---
>
> Key: YARN-5469
> URL: https://issues.apache.org/jira/browse/YARN-5469
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Priority: Minor
>
> Timeout is currently only 1 second. Saw a timeout failure
> {noformat}
> java.lang.Exception: test timed out after 1000 milliseconds
>   at java.util.zip.ZipFile.getEntry(Native Method)
>   at java.util.zip.ZipFile.getEntry(ZipFile.java:311)
>   at java.util.jar.JarFile.getEntry(JarFile.java:240)
>   at java.util.jar.JarFile.getJarEntry(JarFile.java:223)
>   at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:841)
>   at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:364)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-03 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852243#comment-15852243
 ] 

Eric Payne commented on YARN-5889:
--

Thanks [~sunilg] and [~leftnoteasy] for your work on this feature. I do have 
one concern. I think there is a race condition where if a container fails, the 
freed resources are not recorded for that user about half the time.

Use Case:
- Queue is 50% of cluster
- MULP = 50%
- One app fills the cluster
- Some containers fail
-- I simulate this by using {{yarn container -signal 
container_1486159534159_0004_01_29  FORCEFUL_SHUTDOWN}}
- The app is only given new containers about half the time.
-- That is to say, the app is asking for resources, and the cluster has free 
space, but the app is not being given those resources.

I'm sorry I can't go into more detail at this time. I just discovered this 
issue, and I have not had time to investigate further. However, since you are 
about to complete the work on this JIRA, I felt I should provide the 
information I have so far.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855123#comment-15855123
 ] 

Eric Payne commented on YARN-5889:
--

OK. Latest patch looks good.

[~sunilg], [~leftnoteasy], are you planning on backporting to branch-2 / 
branch-2.8?

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6152) Used queue percentage not accurate in UI for 2.7 and below when using DominantResourceCalculator

2017-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856203#comment-15856203
 ] 

Eric Payne commented on YARN-6152:
--

Thanks [~jhung], I will review today.

> Used queue percentage not accurate in UI for 2.7 and below when using 
> DominantResourceCalculator
> 
>
> Key: YARN-6152
> URL: https://issues.apache.org/jira/browse/YARN-6152
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: dominantRC.png, YARN-6152-branch-2.7.001.patch, 
> YARN-6152-branch-2.7.002.patch
>
>
> YARN-4751 adds the {{getUsedCapacity}} and {{getAbsoluteUsedCapacity}} 
> methods to {{AbstractCSQueue}} which is used to display queue usage in UI for 
> branch-2.7 and below. However if there is more than one partition in the 
> cluster, with different dominant resources, then queue usage may not be 
> displayed as expected.
> Contrived example: Default partition has <90GB, 10vcores>, and "test" 
> partition has <10GB, 90vcores>. {{root}} queue in default partition uses 
> <30GB, 10vcores>. Here we expect queue usage to be 100 since it is using all 
> vcores in default partition. But the displayed usage will be 
> (30GB/100GB)/(90GB/100GB) = 33%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6152) Used queue percentage not accurate in UI for 2.7 and below when using DominantResourceCalculator

2017-02-07 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6152:
-
Fix Version/s: 2.7.4

> Used queue percentage not accurate in UI for 2.7 and below when using 
> DominantResourceCalculator
> 
>
> Key: YARN-6152
> URL: https://issues.apache.org/jira/browse/YARN-6152
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Fix For: 2.7.4
>
> Attachments: dominantRC.png, YARN-6152-branch-2.7.001.patch, 
> YARN-6152-branch-2.7.002.patch
>
>
> YARN-4751 adds the {{getUsedCapacity}} and {{getAbsoluteUsedCapacity}} 
> methods to {{AbstractCSQueue}} which is used to display queue usage in UI for 
> branch-2.7 and below. However if there is more than one partition in the 
> cluster, with different dominant resources, then queue usage may not be 
> displayed as expected.
> Contrived example: Default partition has <90GB, 10vcores>, and "test" 
> partition has <10GB, 90vcores>. {{root}} queue in default partition uses 
> <30GB, 10vcores>. Here we expect queue usage to be 100 since it is using all 
> vcores in default partition. But the displayed usage will be 
> (30GB/100GB)/(90GB/100GB) = 33%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856400#comment-15856400
 ] 

Eric Payne commented on YARN-5889:
--

[~leftnoteasy]
bq. Since we're close to 2.8 release now, let's try to see if this patch can go 
to 2.8.1 or not after 2.8.0 release.
Since the branch-2.8.0 branch has already been created, wouldn't it be safe to 
go into branch-2.8(.1)? Or are you concerned that if they need to pull more 
things into the 2.8.0 branch before the RC, this patch may conflict?

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6152) Used queue percentage not accurate in UI for 2.7 and below when using DominantResourceCalculator

2017-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856365#comment-15856365
 ] 

Eric Payne commented on YARN-6152:
--

The patch LGTM. +1

The unit test failures are unrelated.

> Used queue percentage not accurate in UI for 2.7 and below when using 
> DominantResourceCalculator
> 
>
> Key: YARN-6152
> URL: https://issues.apache.org/jira/browse/YARN-6152
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: dominantRC.png, YARN-6152-branch-2.7.001.patch, 
> YARN-6152-branch-2.7.002.patch
>
>
> YARN-4751 adds the {{getUsedCapacity}} and {{getAbsoluteUsedCapacity}} 
> methods to {{AbstractCSQueue}} which is used to display queue usage in UI for 
> branch-2.7 and below. However if there is more than one partition in the 
> cluster, with different dominant resources, then queue usage may not be 
> displayed as expected.
> Contrived example: Default partition has <90GB, 10vcores>, and "test" 
> partition has <10GB, 90vcores>. {{root}} queue in default partition uses 
> <30GB, 10vcores>. Here we expect queue usage to be 100 since it is using all 
> vcores in default partition. But the displayed usage will be 
> (30GB/100GB)/(90GB/100GB) = 33%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854640#comment-15854640
 ] 

Eric Payne commented on YARN-5889:
--

[~sunilg], Thanks for running those manual tests. Those were the same tests 
that I was running.

I discovered that this was not something introduced by this patch, but it 
happens in trunk as well. Also, I discovered that it doesn't happen for a 
cluster with 1, 2, or 3 nodemanagers, but when the 4th nodemanager is added, it 
starts happening.

At any rate, the good news is that this is not a problem with your patch. I 
have a couple of more things I want to check out on the patch before my review 
is complete. I'll get back to you soon.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-04 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852845#comment-15852845
 ] 

Eric Payne commented on YARN-5889:
--

bq. resources are not recorded for that user about half the time.resources are 
not recorded for that user about half the time.
Another symptom is that the app never completes. It stays hung in the RUNNING 
state. The app attempts that failed are re-tried, but they stay in the 
'STARTING  NEW' state. 

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-24 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836712#comment-15836712
 ] 

Eric Payne commented on YARN-5889:
--

bq. LQ#allocateResource will invoke UM#User#assignContainer. So all containers 
including AM will hit here.
[~sunilg], I'm sorry I wasn't clear.

As you say, {{UM#User#assignContainer}} is called for the AM. However, since 
{{userName}} is not active at that point, {{incUsed}} is not called and the 
AM's resources are not incremented:
{code}
  if (activeUsersManager.isAnActiveUser(userName)) {
activeUsersManager.getTotalResUsedByActiveUsers().incUsed(nodePartition,
resource);
  }
{code}

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-01-26 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-5892:


Assignee: Eric Payne

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-02-23 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.002.patch

Uploading new patch to address the javadoc and findbug warnings. I also 
modified this patch to be refreshable.

The unit test ({{TestRMRestart}}) is failing intermittently on trunk with and 
without this patch.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-5892.001.patch, YARN-5892.002.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-02-24 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882720#comment-15882720
 ] 

Eric Payne commented on YARN-5892:
--

Thanks, [~leftnoteasy], for your feedback. I really value your input.

{quote}
in my mind there're some alternative solutions:
a. Create queue just for such vip users
{quote}

In our multi-tenant clusters, we have several users (sometimes dozens) needing 
to use the same queue. Setting up separate queues for each of them based on 
weighted importance is more complicated than giving each users their own weight.

bq. #1, if there're N (N <= 100 / MULP) users are consuming resource in a 
queue, each of them can get at least MULP / 100 * queue-configured-capacity.

Even today, we can have N > 100/MULP. If I think of these _VIP_ users being the 
weighted as multiple users, then we have a similar situation. In your example 
above, Jack and Alice would be weighted as 1 user, but Admin would be 2.5 users.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-5892.001.patch, YARN-5892.002.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-02-22 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Attachment: YARN-5892.001.patch

Attaching first draft of the patch. I still need to make the 
user-specific-minimum-user-limit-percent refreshable, but I wanted to get this 
out there for review sooner rather than later.

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-5892.001.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6165) Intra-queue preemption occurs even when preemption is turned off for a specific queue.

2017-02-09 Thread Eric Payne (JIRA)

Eric Payne created YARN-6165:


 Summary: Intra-queue preemption occurs even when preemption is 
turned off for a specific queue.
 Key: YARN-6165
 URL: https://issues.apache.org/jira/browse/YARN-6165
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0-alpha2
Reporter: Eric Payne


Intra-queue preemption occurs even when preemption is turned on for the whole 
cluster ({{yarn.resourcemanager.scheduler.monitor.enable == true}}) but turned 
off for a specific queue 
({{yarn.scheduler.capacity.root.queue1.disable_preemption == true}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6165) Intra-queue preemption occurs even when preemption is turned off for a specific queue.

2017-02-09 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-6165:


Assignee: Eric Payne

> Intra-queue preemption occurs even when preemption is turned off for a 
> specific queue.
> --
>
> Key: YARN-6165
> URL: https://issues.apache.org/jira/browse/YARN-6165
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-alpha2
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> Intra-queue preemption occurs even when preemption is turned on for the whole 
> cluster ({{yarn.resourcemanager.scheduler.monitor.enable == true}}) but 
> turned off for a specific queue 
> ({{yarn.scheduler.capacity.root.queue1.disable_preemption == true}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6165) Intra-queue preemption occurs even when preemption is turned off for a specific queue.

2017-02-09 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859987#comment-15859987
 ] 

Eric Payne commented on YARN-6165:
--

Use Case:
- Configure queues with cluster-wide preemption on, but a specific queue's 
preemption off (see above).
- Submit a job at priority 1 that fills the entire queue
- Submit a job as the same user at priority 2

Containers from the first job will be preempted when they shouldn't be.

> Intra-queue preemption occurs even when preemption is turned off for a 
> specific queue.
> --
>
> Key: YARN-6165
> URL: https://issues.apache.org/jira/browse/YARN-6165
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-alpha2
>Reporter: Eric Payne
>
> Intra-queue preemption occurs even when preemption is turned on for the whole 
> cluster ({{yarn.resourcemanager.scheduler.monitor.enable == true}}) but 
> turned off for a specific queue 
> ({{yarn.scheduler.capacity.root.queue1.disable_preemption == true}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-18 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829005#comment-15829005
 ] 

Eric Payne commented on YARN-5889:
--

[~leftnoteasy] and [~sunilg], I'm sorry for coming back to this point, but I 
just now realized the full consequences.
{code: title=UsersManager#computeUserLimit}
active-user-limit = max(
   resource-used-by-active-users / #active-users,
   queue-capacity * MULP
)
{code}
With the above algorithm, {{active-user-limit}} never goes above 
{{resource-used-by-active-users / #active-users}} if MULP is less than 100%. I 
think this is because {{consumed}} is never greater than {{queue-capacity}} in 
that case.

That is to say:
- {{queue-capacity}} = {{partitionResource * queueAbsCapPerPartition}}
- {{queue-capacity}} = {{(consumed < queue-capacity) ? queue-capacity : 
(consumed + minAllocation)}}

Since {{consumed}} never gets over {{queue-capacity}} when MULP is less than 
100%, {{queue-capacity}} will never equal {{consumed + minAllocation}}.

I have tested this to verify.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830384#comment-15830384
 ] 

Eric Payne commented on YARN-5889:
--

bq. the user limit resource is never calculated to be more than 
resource-used-by-active-users / #active-users
Sorry, I meant to say that it never gets above {{queue-capacity * MULP}}

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-20 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832068#comment-15832068
 ] 

Eric Payne commented on YARN-5889:
--

bq. it never gets above {{queue-capacity * MULP}}
[~sunilg] and [~leftnoteasy], although this statement is true and I correctly 
stated the symptoms, I misdiagnosed the root cause in my [comments 
above|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=13021186=15829005].
 Sorry for the confusion.

It appears that the root cause is that {{UM#User#assignContainer}} is not 
incrementing {{TotalResUsedByActiveUsers}} for the AM. The first time through 
{{assignContainer}} for a new app, the user isn't active yet, so the used 
resources count is not incremented. Consequently, 
{{resource-used-by-active-users}} is always smaller than the actual value, and 
never gets bigger than {{queue-capacity * MULP}}:
{code: title=UsersManager#computeUserLimit}
active-user-limit = max(
   resource-used-by-active-users / #active-users,
   queue-capacity * MULP
)
{code}

[~sunilg], do we need the {{isAnActiveUser}} checks in {{assignContainer}} and 
{{releaseContainer}}? I removed these checks in my local build and the 
application is able to use all of the queue and cluster.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830149#comment-15830149
 ] 

Eric Payne commented on YARN-5889:
--

[~leftnoteasy], the crux of the problem is that if MULP is less than 100% (for 
example 20%), the user limit resource is never calculated to be more than 
{{resource-used-by-active-users / #active-users}}

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-08-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428904#comment-15428904
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg], thank you so much for providing this design doc and POC. I have not 
yet looked at the patch, but I have a few comments on the design doc.

-
{quote}
Additional Requirement specs
...
- Over subscribed queue ...
-- Selected containers will completely serve resource need from starving apps.
...
-- Selected containers only partially serves the need
...
By scanning through each partition and its associated queues 
(TempQueuePerPartition), we can understand how much resources are offered from 
each queue for preemption and also the selected container list. This can be 
used as a reference to avoid double calculations in intraqueue preemption 
round.
{quote}
I'm pretty sure that the containers already in the {{selectedCandidates}} list 
will _not_ be re-assigned to anything in the current queue. The containers are 
in that list because some other queue is asking for them. Even if containers 
that are already in the inter-queue preemption list would also help resolve an 
intra-queue preemption problem, those containers will go to the more 
underserved queue before coming back to the current queue. My assertion is that 
regardless of what containers are already in the {{selectedCandidates}} list, 
the intra-queue preemption policy would always need to select more.

-
{quote}
Configurations and considerations
- Provide a configuration to turn on/off intraqueue preemption along with the 
type of policy it is going to handle (priority, fairness, userlimit etc)
{quote}
Additionally, we may want to consider intra-queue preemption configs for dead 
zone, natural completion, etc. This may even need to be per queue.


-
{quote}
 Select ideal candidates for intraqueue preemption per priority.
...
3. ‘pending’ resource per partition will be calculated for all the apps and 
together store in a consolidated map (resourceToObtain) of pending resource to 
be collected per partition in one queue.
{quote}
The use of the word "pending" in conjunction with the reference to 
{{resourceToObtain}} is confusing to me. It sounds like "pending" is talking 
about "preemptable resources," but "pending" means "resources requested but not 
yet allocated." (See 
{{LeafQueue#getTotalPendingResourcesConsideringUserLimit}}).

For instance, the {{resToObtainByPartition}} variable in 
{{FifoCandidatesSelector}} is used for holding the amount of extra (and 
therefore preemptable) resources being used by a queue. Is this step 
calculating the total of preemptable resources for apps in this queue, per 
partition?

-
{quote}
4. While doing this, we will ensure that certains apps will be skipped if it is 
already equal or more that its userlimit quota.  This map will be the entry 
point to select candidates from lower priority apps in next step.
{quote}
Is this saying that, when marking containers for preemption, if an app is under 
its user limit percent, its containers will not be marked? Or, is it saying 
that if an app is asking for more containers and it is already over its user 
limit percent, other apps' containers won't be preempted on its behalf?

Not only do we need to avoid preemptiong resources _for_ users that are over 
their user limit percent, we need to avoid preempting containers _from_ users 
that are under their user limit percent. Even today in the capacity scheuler, 
if I have a queue with a 50% user limit percent, and app1 from user1 is 
priority1 and app2 from user2 is priority2, and they are both asking for more 
resources, user2 will not get more containers until user1 has reached 50% of 
the queue. In other words, user limit percent trumps application priority.

-
I am concerned that priority-based intra-queue preemption has a different set 
of goals than user limit percent-based intra-queue preemption. For instance,
-  requirements for user limit percent-based preemption are calculated based at 
the user level, while priority-based preemption requirements go down to the app 
level.
- User limit percent-based preemption only makes sense if multiple users are in 
a queue, and priority-based preemption only makes sense if a priority inversion 
can happen between apps of the same user in a queue.

Perhaps these should be totally separate policies. Anyway, for us, user limit 
percent-based preemption is much more important.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-08-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431513#comment-15431513
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg], Thanks a lot for your reply!

{quote}
- user-limit: may be partial or full pending resource request will become 
resource to obtain for this app. This is depending on user-limit_headroom - 
current_used. This much can be considered as demand from this app.
{quote}

I want to make sure we are talking about the same thing, so I would like to 
expressly clarify what I mean by {{user-limit}} because I feel that it is 
ambiguous and may be causing confusion.

In the statement above, I think you are referring to 
{{yarn.scheduler.capacity.root.QUEUE1.user-limit-factor}}, which plays a role 
in determining each user's headroom in a queue. {{user-limit-factor}} is 
important to consider when calculating how much of an app's pending resources 
should be preempted from other apps. Failure to consider this caused us 
problems and resulted in YARN-3769.

However, in the context of intra-queue preemption, 
{{yarn.scheduler.capacity.root.QUEUE1.minimum-user-limit-percent}} is the 
property I want to focus on.  My goal is to ensure that each queue is evenly 
divided between the appropriate number of users, as defined by 
{{minimum-user-limit-percent}}.

bq. with this poc, i am coming with framework and priority preemption.
Thank you very much for doing that!

bq. However for doc, it will be good if we could have it common for priority 
and user-limit.
Agreed.

Also, I think it would be helpful to define use cases so that everyone is clear 
about what problems we are trying to solve. I will make an attempt at that and 
post a doc here.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-03-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891256#comment-15891256
 ] 

Eric Payne commented on YARN-5892:
--

{quote}
So I preferred to keep the semantic more similar to existing one, I propose to 
introduce a weight of users instead of overriding the MULP: scheduler will 
continue assign MULP% shares to each "unit users", but different user can have 
different weight to adjust quota based on share of "unit users". Also, the 
weights of users can be used independent from MULP: because in the future we 
may want to replace concept of user limit by different ones. (Like setting 
quota for each user, give weighted fair share to users, etc.)
{quote}
Thanks [~leftnoteasy] for your review.

In my mind, overriding queue's MULP with user-specific MULP is equivalent to 
adding weights to special users, and would be implemented in a similar way. If 
I understand correctly, you are saying that the weighted approach gives more 
flexibility for future features like user quota, weighted user fair share, etc. 
Is that correct?

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-5892.001.patch, YARN-5892.002.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6248) Killing an app with pending container requests leaves the user in UsersManager

2017-02-27 Thread Eric Payne (JIRA)

Eric Payne created YARN-6248:


 Summary: Killing an app with pending container requests leaves the 
user in UsersManager
 Key: YARN-6248
 URL: https://issues.apache.org/jira/browse/YARN-6248
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3
Reporter: Eric Payne
Assignee: Eric Payne


If an app is still asking for resources when it is killed, the user is left in 
the UsersManager structure and shows up on the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6248) Killing an app with pending container requests leaves the user in UsersManager

2017-02-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6248:
-
Attachment: User Left Over.jpg

> Killing an app with pending container requests leaves the user in UsersManager
> --
>
> Key: YARN-6248
> URL: https://issues.apache.org/jira/browse/YARN-6248
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: User Left Over.jpg
>
>
> If an app is still asking for resources when it is killed, the user is left 
> in the UsersManager structure and shows up on the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6248) Killing an app with pending container requests leaves the user in UsersManager

2017-03-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890386#comment-15890386
 ] 

Eric Payne commented on YARN-6248:
--

The above Unit tests 
({{TestLeaderElectorService,TestDelegationTokenRenewer,TestFairSchedulerPreemption}})
 are passing for me. I also ran all of the UT from under capacity scheduler.

> Killing an app with pending container requests leaves the user in UsersManager
> --
>
> Key: YARN-6248
> URL: https://issues.apache.org/jira/browse/YARN-6248
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: User Left Over.jpg, YARN-6248.001.patch
>
>
> If an app is still asking for resources when it is killed, the user is left 
> in the UsersManager structure and shows up on the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-02 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15458695#comment-15458695
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg], just one quick note: I am getting a 
{{UnsupportedOperationException}} RuntimeException in 
{{IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand}}:
{code}
Collection apps = leafQueue.getApplications();
apps.addAll(leafQueue.getPendingApplications());
{code}
{{LeafQueue#getApplications}} returns an umnodifiable Collection.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-09-03 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461859#comment-15461859
 ] 

Eric Payne commented on YARN-:
--

Thank you very much, [~vvasudev], for the review and the commit.

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-05 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465592#comment-15465592
 ] 

Eric Payne commented on YARN-4945:
--

[~leftnoteasy] and [~sunilg],
bq. Using logic similar to 
{{deductPreemptableResourcesBasedSelectedCandidates}} should be able to achieve 
this, and I think it doesn't bring too many complexities to the implementation.
I'm sorry, but I'm still not understanding how this can work.

In {{PriorityCandidatesSelector#preemptFromLeastStarvedApp}}:
{code}
  if (CapacitySchedulerPreemptionUtils.isContainerAlreadySelected(c,
  selectedCandidates)) {
Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource());
Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource());
continue;
  }
{code}
This code seems to indicate that if a container is already in 
{{selectedCandidates}}, it will be preempted and then given back to apps in 
this queue. But if it's already in {{selectedCandidates}}, it's because an 
inter-queue preemption policy put it there, so it's not likely to end up back 
in this queue. Please help me understand what I'm missing.

Also, Why is it subtracting the container's resources twice from 
{{toObtainByPartition}}? Should one of those be 
{{totalPreemptedResourceAllowed}}?

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468165#comment-15468165
 ] 

Eric Payne commented on YARN-4945:
--

Thanks [~sunilg].
{quote}
bq. if it's already in selectedCandidates, it's because an inter-queue 
preemption policy put it there
I think I must give some more clarity for what I am trying to do here. Its 
possible that there can be some containers which were selected by 
priority/user-limit policy may already be selected from inter-queue policies. 
In that case, we need not have to mark them again. Rather we can deduct the 
resource directly as its container marked for preemption.
{quote}
OK. I think I see what you are saying.

In {{IntraQueueCandidatesSelector#preemptFromLeastStarvedApp}}:
{code}
  if (CapacitySchedulerPreemptionUtils.isContainerAlreadySelected(c,
  selectedCandidates)) {
Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource());
continue;
  }
{code}
IIUC, you are saying that at this point, {{toObtainByPartition}} contains 
requested resources from _both_ inter-queue _and_ intra-queue preemption 
policies. So, since this container has already been selected by the inter-queue 
policies, skip it, stop tracking its resources in {{toObtainByPartition}} (by 
subtracting out the container's size), and keep looking for another container 
to mark as preemptable. Is that correct?

-
Also, I think that priority and user-limit-percent preemption policies should 
be separate policies. Do you agree? If so, can we please rename 
{{IntraQueueCandidatesSelector}} to something like 
{{IntraQueuePriorityCandidatesSelector}}


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455570#comment-15455570
 ] 

Eric Payne commented on YARN-4945:
--

Thanks very much [~sunilg] and [~leftnoteasy].

{quote}
1.
I think we might need to come with a limit on how much resource can be 
preempted from over-utilizing users's apps. WE do have 
max-preemption-per-round. But sometimes it may be more as it may be configured 
for inter-queue. Since we are sharing this config, i think we can have a config 
to limit the preemption for user-limit. For priority, i have considered a 
certain limit to control this scenario. Thoughts?
{quote}
I think we do need several intra-queue configs that are separate from the 
existing (inter-queue) ones. For inter-queue vs. intra-queue, I think we need a 
separate one at least for {{total_preemption_per_round}} and  
{{max_ignored_over_capacity}}, and maybe even for 
{{natural_termination_factor}} and {{max_wait_before_kill}}. 

Are you also suggesting that we need these configs to also be spearate between 
user-limit-percent preemption and priority preemption within intra queue? I 
don't have a strong opinion either way, but if we can keep all configs the same 
between intra-queue preemption policies, I would like to do that, just to avoid 
confusion and complication.

bq. I will not consider preemption demand from a high priority if that app is 
already crossing the user-limit.

I just want to make sure we are talking about the same thing. In the case I am 
worried about, the high priority app is _*not*_ over any limit. There is an 
inversion happening because the lower priority app has containers and the high 
priority app wants them. But, if the low priority app is from a user that is at 
or below its {{minimum-user-limit-percent}}, the higher priority app must not 
continue to preempt from the lower priority app. This only can happen when the 
two apps are from different users.

{quote}
I think normalization for inter-queue / intra-queue preemption is one of the 
top priority goal for this feature.
If you take a look at existing preemption code, it normalizes preempt-able 
resource for reserved-container-candidate-selector and fifo-candidate-selector. 
We can do the similar normalization for inter/intra-queue preemption.
{quote}
Trying to do this coordination seems to me to be quite complicated. Would it be 
sufficient to just avoid preempting during the intra-queue policies if there 
are already containers in the {{selectedContainers}} list?

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-08-30 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-:


Assignee: Eric Payne

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: PctOfQueueIsInaccurate.jpg
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472191#comment-15472191
 ] 

Eric Payne commented on YARN-4945:
--

Thank you [~leftnoteasy]. I see now that 
{{IntraQueueCandidatesSelector#tryPreemptContainerAndDeductResToObtain}} is 
checking {{totalPreemptionAllowed}} before selecting each container.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472235#comment-15472235
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg], thanks again for all of the great work you are doing on this issue.

\\
- Separate switches for priority and user-limit-percent preemption?

{{ProportionalCapacityPreemptionPolicy#init}} uses 
{{SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}} to turn on all intra-queue 
preemption policies, but the config property name for 
{{SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}} is 
{{select_based_on_priority_of_applications}}.

I actually would like to have the priority policy and the 
minumum-user-limit-percent policy be turned on separately. I'm not sure of the 
best way to do that, but our users don't use application priority very much.

Perhaps {{CapacitySchedulerConfiguration}} could have something like:
{code}
  /**
   * For intra-queue preemption, priority based selector can help to preempt
   * containers of lowest priority apps to find resources for high priority
   * apps.
   */
  public static final String 
PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY =
  PREEMPTION_CONFIG_PREFIX + "select_based_on_priority_of_applications";
  public static final boolean 
DEFAULT_PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY = false;

  /**
   * For intra-queue preemption, minimum-user-limit-percent based selector can
   * help to preempt containers to ensure users are not starved of their
   * guaranteed percentage of a queue.
   */
  public static final String 
PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE =
  PREEMPTION_CONFIG_PREFIX + "select_based_on_user_percentage_guarantee";
  public static final boolean 
DEFAULT_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE = false;

{code}

And then {{ProportionalCapacityPreemptionPolicy#init}} can turn on intra-queue 
preemption if either one is set:
{code}
boolean selectIntraQueuePreemptCandidatesByPriority = csConfig.getBoolean(

CapacitySchedulerConfiguration.PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY,

CapacitySchedulerConfiguration.DEFAULT_PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY);
boolean selectIntraQueuePreemptCandidatesByUserPercentGuarantee = 
csConfig.getBoolean(

CapacitySchedulerConfiguration.PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE,

CapacitySchedulerConfiguration.DEFAULT_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE);
if (selectIntraQueuePreemptCandidatesByPriority || 
selectIntraQueuePreemptCandidatesByUserPercentGuarantee) {
  candidatesSelectionPolicies.add(new IntraQueueCandidatesSelector(this));
}
{code}

Then, in {{IntraQueueCandidatesSelector}} logic could be added to do either one 
or both intra-queue preemption policies. What do you think?
\\
\\

\\
- Could headroom check allow priority inversion?

{{PriorityIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue}}:
{code}
  // Can skip apps which are already crossing user-limit.
  // For this, Get the userlimit from scheduler and ensure that app is
  // not crossing userlimit here. Such apps can be skipped.
  Resource userHeadroom = leafQueue.getUserLimitHeadRoomPerApp(
  a1.getFiCaSchedulerApp(), context.getPartitionResource(partition),
  partition);
  if (Resources.lessThanOrEqual(rc,
  context.getPartitionResource(partition), userHeadroom,
  Resources.none())) {
continue;
  }
{code}
I think this code will allow a priority inversion when a user has apps of 
different priorities. For example, in a situation like the following, {{App1}} 
from {{User1}} is already taking up all of the resources, so its headroom is 0. 
But, since {{App2}} is also from {{User1}}, the above code will never allow 
preemption to occur. Is that correct?

||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||
|QUEUE1|User1|App1|1|200|0|
|QUEUE1|User1|App2|10|0|50|


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-08-30 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-:
-
Attachment: YARN-.001.patch

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-09 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478391#comment-15478391
 ] 

Eric Payne commented on YARN-4945:
--

Thanks [~sunilg]. I have a few review comments for patch v0:

- {{IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand}}:
-- Neither of the parameters are used ({{clusterResource}} 
{{totalPreemptedResourceAllowed}})
-- {{queueNames}} can be null, which causes an NPE in {{for (String queueName : 
queueNames)}}
-- {{leafQueue}} will be null if {{tq}} represents a parent queue, which causes 
NPE when dereferenced later.
-- {{CapacitySchedulerConfiguration.USED_CAPACITY_THRESHOLD_FOR_PREEMPTION}}: 
[~leftnoteasy] indicated above that this property is similar to 
{{MAX_IGNORED_OVER_CAPACITY}}, but I'm not sure I understand how that applies 
to intra-queue preemption. The comparison is not between queues at this point, 
it's between apps or users. In patch v0, the following code has the effect of 
only allowing preemption if the queue's used resources are below 
{{USED_CAPACITY_THRESHOLD_FOR_PREEMPTION}}, which defaults to 30%. It doesn't 
make sense to me to limit intra-queue preemption based on how much of the 
queue's guaranteed resources are used.
{code}
if (leafQueue.getUsedCapacity() < context
.getUsedCapThresholdForPreemptionPerQueue()) {
  continue;
}
{code}

- {{IntraQueueCandidatesSelector#selectCandidates}}:
-- If {{queueName}} is not a leaf queue, {{leafQueue}} will be null and cause 
NPE when dereferenced later:
{code}
  // 4. Iterate from most under-served queue in order.
  for (String queueName : queueNames) {
LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName,
RMNodeLabelsManager.NO_LABEL).leafQueue;
{code}
-- Very tiny nit: Remove the word {{get}} from the following:
{code}
// 3. Loop through all partitions to get calculate demand
{code}

- {{AbstractPreemptableResourceCalculator}}:
-- Since {{TAComparator}} is specifically comparing app priority, can it be 
renamed to something like {{TAPriorityComparator}}?

- {{TempAppPerQueue#toString}}:
-- Small nit: Can the {{toString}} method print ApplicationID and rename 
{{NAME}} to {{QUEUENAME}}?


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-13 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488403#comment-15488403
 ] 

Eric Payne commented on YARN-4945:
--

bq. Does this make sense?
[~sunilg], Thanks for the reply. Yes. I was misreading the code. Sorry about 
that.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-09-12 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-:
-
Fix Version/s: (was: 3.0.0-alpha2)
   (was: 2.9.0)
   2.8.0

Thanks [~varun_saxena]. I have backported this to 2.8.0

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-15 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493893#comment-15493893
 ] 

Eric Payne commented on YARN-4945:
--

Thanks again, [~sunilg]. I will look closely at the patch, but one thing I 
wanted to bring out before too much time passes is that some of the IntraQueue 
classes seem priority-centric and do not lend themselves to adding multiple 
intra-queue policies.

- The constructor for {{IntraQueueCandidatesSelector}} passes 
{{priorityBasedPolicy}} as a parameter directly to the constructor for 
{{IntraQueuePreemptableResourceCalculator}}
- {{IntraQueueCandidatesSelector#selectCandidates}} passes 
{{priorityBasedPolicy}} as a parameter directly to 
{{CapacitySchedulerPreemptionUtils.getResToObtainByPartitionForApps}}.

I think that the objects that implement the {{IntraQueuePreemptionPolicy}} 
interface should be in in a {{List}}, and then 
{{IntraQueueCandidatesSelector#selectCandidates}} should loop over the list to 
process the different policies.


Please change the name of variables in classes that need to be independent of 
the specific intra-queue policy:
- {{CapacitySchedulerPreemptionUtils#getResToObtainByPartitionForApps}} has a 
parameter named {{priorityBasedPolicy}}, but this should be generic, like 
{{intraQueuePolicy}}
- {{IntraQueuePreemptableResourceCalculator}} also has a variable named 
{{priorityBasedPolicy}}, which I think should be more generic.
- 
{{CapacitySchedulerConfiguration#SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}}: 
since the value for this property is the switch to turn on intra-queue 
preemption, the name should be something more generic. Currently, it is 
{{yarn.resourcemanager.monitor.capacity.preemption.select_based_on_priority_of_applications}},
 but it should be something like {{enable_intra_queue_preemption}}.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, 
> YARN-2009.v1.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-15 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495060#comment-15495060
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg],
I noticed in the resourcemanager log that the metrics were not as I would 
expect after running applications. For example, after 1 application has 
completed running, the {{#queue-active-applications}} metrics remains 1 instead 
of 0:
{code}
2016-09-16 01:11:10,189 [SchedulerEventDispatcher:Event Processor] INFO 
capacity.LeafQueue: Application removed - appId: application_1473988192446_0001 
user: hadoop1 queue: glamdring #user-pending-applications: 0 
#user-active-applications: 0 #queue-pending-applications: 0 
#queue-active-applications: 1
{code}
After 3 applications have run, the metrics are even more unexpected:
{code}
2016-09-16 01:12:34,622 [SchedulerEventDispatcher:Event Processor] INFO 
capacity.LeafQueue: Application removed - appId: application_1473988192446_0003 
user: hadoop1 queue: glamdring #user-pending-applications: -4 
#user-active-applications: 4 #queue-pending-applications: 0 
#queue-active-applications: 3
{code}
I believe the cause of this is in {{LeafQueue#getAllApplications}}:
{code}
  public Collection getAllApplications() {
Collection apps =
pendingOrderingPolicy.getSchedulableEntities();
apps.addAll(orderingPolicy.getSchedulableEntities());

return Collections.unmodifiableCollection(apps);
  }
{code}
The call to {{pendingOrderingPolicy.getSchedulableEntities()}} returns the 
{{AbstractComparatorOrderingPolicy#schedulableEntities}} object, and then the 
call to {{apps.addAll(orderingPolicy.getSchedulableEntities())}} adds 
additional {{FiCaSchedulerApp}}'s to {{schedulableEntities}}.

By creating a copy of the return value of 
{{pendingOrderingPolicy.getSchedulableEntities()}}, I have been able to verify 
that the {{schedulableEntities}} does not have extra entries. For example:
{code}
  public Collection getAllApplications() {
Collection apps = new TreeSet(
pendingOrderingPolicy.getSchedulableEntities());
apps.addAll(orderingPolicy.getSchedulableEntities());

return Collections.unmodifiableCollection(apps);
  }
{code}

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, 
> YARN-2009.v1.patch, YARN-2009.v2.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-05 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550216#comment-15550216
 ] 

Eric Payne commented on YARN-2009:
--

Thanks, [~sunilg], for the new patch. Preemption for the purposes of preventing 
priority inversion seems to work now without unneeded preemption.

However, in this new patch, user-limit-percent preemption doesn't seem to be 
working. If:
# {{user1}} starts {{app1}} at {{priority1}} on {{Queue1}} and consumes the 
entire queue
# {{user2}} starts {{app2}} at {{priority1}} on {{Queue1}}

preemption does not happen.

I will continue to investigate, but I thought I would let you know.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553303#comment-15553303
 ] 

Eric Payne commented on YARN-2009:
--

Thanks, [~sunilg], for your reply.

-
- {{FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp}}
-- The assignment to {{tmpApp.idealAssigned}} should be cloned:
{code}
tmpApp.idealAssigned = Resources.min(rc, clusterResource,
queueTotalUnassigned, appIdealAssigned);
...
  Resources.subtractFrom(queueTotalUnassigned, tmpApp.idealAssigned);
{code}
-- In the above code, if {{queueTotalUnassigned}} is less than 
{{appIdealAssigned}}, then {{tmpApp.idealAssigned}} is assigned a reference to 
{{queueTotalUnassigned}}. Then, later, {{tmpApp.idealAssigned}} is actually 
subtracted from itself.

-
bq. This current patch will still handle priority and priority + user-limit. 
Thoughts?
I am not comfortable with fixing this in another patch. Our main use case is 
the one where multiple users need to use the same queue with apps at the same 
priority. 

- I still need to think through all of the effects, but I was thinking that 
something like the following could work:
-- I think my use case is failing because 
{{FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp}} orders 
the apps by priority. I think that instead, it should order the apps by how 
much they are underserved. I think that it should be ordering the apps by 
{{tmpApp.toBePreemptedByOther}} instead of priority.
-- Then, if {{calculateIdealAssignedResourcePerApp}} orders the apps by 
{{toBePreemptedByOther}}, {{validateOutSameAppPriorityFromDemand}} would also 
need to not compare priorities but the app's requirements.
-- I think it should be something like the following, maybe:
{code}
while (lowAppIndex < highAppIndex
&& !apps[lowAppIndex].equals(apps[highAppIndex])
//&& apps[lowAppIndex].getPriority() < 
apps[highAppIndex].getPriority()) {
&& Resources.lessThan(rc, clusterResource,
apps[lowAppIndex].getToBePreemptFromOther(), 
apps[highAppIndex].getToBePreemptFromOther()) ) {
{code}


> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-10 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562302#comment-15562302
 ] 

Eric Payne commented on YARN-2009:
--

Thanks for all the work [~sunilg] and [~leftnoteasy] have put into this feature.
bq. can we move it to a separate JIRA and start to work on it right after the 
JIRA is get committed
As long as we don't let the user-limit-percent preemption JIRA grow cold, I'm 
okay with it.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-15 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494394#comment-15494394
 ] 

Eric Payne commented on YARN-4945:
--

bq. I would say it may not be necessarily to have two separate policies to 
consider priority and user-limit.
[~leftnoteasy], I'm not sure how I feel about that yet. I need to understand 
what it would mean to combine all intra-queue priority policies into one. 
Whatever the design, I want to make sure it is not cumbersome to solve the 
user-limit-percent inversion that we often see.

If they are combined, then is it still necessary to make 
{{IntraQueuePreemptionPolicy}} an interface? Wouldn't this just be the 
implementation class and then there would be no need for 
{{PriorityIntraQueuePreemptionPolicy}} or other derivative classes?

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, 
> YARN-2009.v1.patch, YARN-2009.v2.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-16 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497037#comment-15497037
 ] 

Eric Payne commented on YARN-4945:
--

[~leftnoteasy]
bq. Have a Map of username to headroom inside the method can compute user limit 
at most once for different user. And this logic can be reused to compute 
user-limit preemption
Maybe we are talking about the same thing, but I just want to clarify that I am 
not advocating preemption based on headroom (user-limit-factor). I am 
advocating based on minimum user limit percent (MULP), which is the minimum 
guaranteed resource amount per user per queue.

{quote}
To be honest, I haven't thought a good way that a list of policies can better 
solve the priority + user-limit preemption problem. Could you share some ideas 
about it. For example, how to better consider both in the final decision
{quote}
I believe that the two preemption policies (priority and 
minimum-user-limit-percent) are _mostly_ (but not completely) separate. I would 
say that priority preemption only considers apps from the same user, and MULP 
preemption only considers apps from different users.

If you look at the behavior of the capacity scheduler, I was surprised to find 
that it mostly ignores priority when assigning resources between apps of 
different users. I conducted the following experiment, without turning on 
preemption:

# The cluster has only 1 queue, and it takes up all of the resources.
||Queue Name||Total 
Containers||{{user-limit-factor}}||{{minimum-user-limit-percent}}||Priority 
Enabled||
|default|24|1.0 (each user can take up the whole queue if no other users are 
present)|0.25 (if other users are present, each user is guaranteed at least 25% 
of the queue's resources; at max, 4 users can have apps in the queue at once; 
if less than 4 users, the scheduler tries to balance resources evenly between 
users)|false|
# {{user1}} starts {{app1}} in {{default}} of {{priority1}} and consumes all 
resources
# {{user2}} starts {{app2}} in {{default}} of {{priority2}}.
||User Name||App Name||App Priority||Used Containers||Pending Containers||
|user1|app1|1|24|76|
|user2|app2|2|0|100|
# I kill 12 containers from {{app1}} and the capacity scheduler assigns them to 
{{app2}}. Not because {{app2}} has a higher priority than {{app1}}, but because 
{{user2}} is using less resources than {{user1}} (the capacity scheduler tries 
to balance resources between users).
||User Name||App Name||App Priority||Used Containers||Pending Containers||
|user1|app1|1|12|76|
|user2|app2|2|12|76|
# At this point, what should happen if I kill another container from {{app1}}? 
Since {{app2}} is higher priority than {{app1}}, and since MULP is 25% (so 
{{user2}}'s minimum guarantee is only 6), you might think that the capacity 
scheduler will give it to {{app2}} (that's what I thought it would do). _But it 
doesn't!_ The capacity scheduler gives the container back to {{app1}} because 
it wants to balance the resources between all users.  And the table remains the 
same:
||User Name||App Name||App Priority||Used Containers||Pending Containers||
|user1|app1|1|12|76|
|user2|app2|2|12|76|

Once the users are balanced, no matter how many times I kill a container from 
{{app1}}, it always goes back to {{app1}}. From a priority perspective, this 
could be considered an inversion, since {{app2}} is asking for more resources 
and {{app1}} is well above its MULP. But the capacity scheduler does not 
consider priority in this case.

If I try the same experiment, but with both apps owned by {{user1}}, then I can 
kill all of {{app1}}'s containers (except the AM) and they all get assigned to 
{{app2}}

Because the capacity scheduler behaves this way, I would recommend that the 
MULP preemption policy run first and try to balance each user's ideal assigned. 
The MULP policy would preempt from lowest priority first, so it would consider 
priority of apps owned by other, over-served users when deciding what to 
preempt, and consider priority of apps owned by the current, under-served user, 
when deciding ideal-assigned values.

Then, I would run the priority policy but only consider apps within each user. 
As shown above, once the users are balanced between each other with regard to 
MULP, trying to kill containers from higher priority apps of other users will 
only cause preemption churn.

[~leftnoteasy], as you said, we may be able to combine the two into one policy, 
and you may be right that this can be done without being too complicated. The 
thing I want to ensure is that the priority preemption policy doesn't try to 
kill high priority containers from different users that will only be reassigned 
back to the original user and cause preemption churn.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL:

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503934#comment-15503934
 ] 

Eric Payne commented on YARN-2009:
--

[~sunilg], Thanks for providing YARN-2009.0001.patch.

Unfortunately, {{FiCaSchedulerApp.java}} didn't apply cleanly to the latest 
trunk.

Also, I get compilation errors. Still investigating:

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-yarn-server-resourcemanager: Compilation failure: Compilation 
failure:
[ERROR] 
/hadoop/source/YARN-4945/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPolicy.java:[249,14]
 cannot find symbol
[ERROR] symbol:   method getTotalPendingRequests()
[ERROR] location: variable app of type 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
[ERROR] 
/hadoop/source/YARN-4945/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPolicy.java:[258,14]
 cannot find symbol
[ERROR] symbol:   method getTotalPendingRequests()
[ERROR] location: variable app of type 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
{noformat}


> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504007#comment-15504007
 ] 

Eric Payne commented on YARN-2009:
--

[~sunilg], please note that your patch depends on 
{{FiCaSchedulerApp#getTotalPendingRequests}}, but that was removed today by 
YARN-3141. CC-ing [~leftnoteasy]

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-23 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516923#comment-15516923
 ] 

Eric Payne commented on YARN-2009:
--

{quote}
I also thought about this in similar lines. But there was one point which i 
thought it may make more sense if we use guaranteed. If queue is under used 
than its capacity, our current calculation may consider more resource for 
idealAssigned per app. This may yield a lesser value toBePreempted per app 
(from lower priority apps). And it may be fine because there some more resource 
which is available in queue for other high priority apps. So we may not need to 
preempt immediately. Does this make sense?
{quote}
[~sunilg], I'm sorry, but I don't understand. Can you provide a step-by-step 
use case to demonstrate your concern?


> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514480#comment-15514480
 ] 

Eric Payne commented on YARN-2009:
--

[~leftnoteasy], Thanks. I missed that.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-21 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511394#comment-15511394
 ] 

Eric Payne commented on YARN-2009:
--


- {{IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand}}:
-- Shouldn't the following be {{tq.getUsed() - tq.getActuallyToBePreempted()}}? 
{{tq.getGuaranteed()}} only returns the queue's guaranteed capacity but if apps 
in the queue are using extra resources, then you want to subtract from the 
total usage.
{code}
tq.setUnAllocated(Resources.subtract(tq.getGuaranteed(),
tq.getActuallyToBePreempted()));
{code}
-- {{MaxIgnoredOverCapacity}}
{code}
if (leafQueue.getUsedCapacity() < context
.getMaxIgnoredOverCapacityForIntraQueue()) {
  continue;
}
{code}
--- Shouldn't this also take into consideration used capacity of all parent 
queues as well?
--- In any case, can we change the name of the config property and its getters? 
 
{{CapacitySchedulerConfiguration#MAX_IGNORED_OVER_CAPACITY_FOR_INTRA_QUEUE}} / 
{{yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity_for_intra_queue}}
 / {{ProportionalCapacityPreemptionPolicy#maxIgnoredOverCapacityForIntraQueue}}
 This is not really an "over capacity" thing. It's more of an "only start 
to preempt when queue is over this amount" thing. Maybe we could name it 
something like 
{{yarn.resourcemanager.monitor.capacity.preemption.ignore_below_percent_of_queue}}

-
- {{FifoIntraQueuePreemptionPolicy#createTempAppForResourceCalculation}}
-- In the following code, instead of calling the {{containsKey}} or the 
{{get*}} method twice, you could just call the get method, assign its output to 
a tmp var, and then if the tmp var is not null, then assign tmp to the resource 
var. That would just be a little more efficient.
{code}
  if (app.getTotalPendingRequestsPerPartition().containsKey(partition)) {
...
  if (null != app.getAppAttemptResourceUsage().getUsed(partition)) {
...
  if (null != app.getCurrentReservation(partition)) {
{code}
-- Should {{pending}} also be cloned?
{code}
  TempAppPerPartition tmpApp = new TempAppPerPartition(app.getQueueName(),
  app.getApplicationId(), Resources.clone(used),
  Resources.clone(reserved), pending, app.getPriority().getPriority(),
  app, partitions);
{code}

-
- {{FifoIntraQueuePreemptionPolicy#computeAppsIdealAllocation}}:
-- Can you please change the following comment:
{code}
// Apps ordered from highest to lower priority.
{code}
--- to be something like this?
{code}
// Remove the app at the next highest remaining priority and process it.
{code}
-- Can you please change the word "size" to "resources"? When I first saw 
"container size", I thought it was calculating the size of each container.
{code}
  // Calculate total selected container size from current app.
{code}
-- I don't think we want to do the following:
{code}
  if (Resources.lessThanOrEqual(rc, partitionBasedResource, userHeadroom,
  Resources.none())) {
continue;
  }
{code}
--- If {{user1}} has used the entire queue with a low priority app, {{user1}}'s 
headroom will be 0. But, if that same user starts a higher priority app, that 
higher priority app needs to preempt from the lower priority app, doesn't it?
-- I assume that you will rework the {{idealAssigned}} logic to match 
[~leftnoteasy]'s algorithm [that he provided 
above|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15504978=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15504978].
 That is, the algorithm that takes into account {{user-limit-resource}}

-
- {{FifoIntraQueuePreemptionPolicy#getHighPriorityApps}}:
-- The comments in this method are the same as those in 
{{AbstractPreemptableResourceCalculator#getMostUnderservedQueues}}, but they 
don't apply for {{getHighPriorityApps}}.
-- {{getHighPriorityApps}} doesn't need to return 
{{ArrayList}}. It will only be retrieving one app at a 
time. In {{AbstractPreemptableResourceCalculator#getMostUnderservedQueues}}, 
it's possible for 2 or more queues to be underserved by exactly the same 
amount, so all of the most underserved queues must be processed together. 
However, {{getHighPriorityApps}} is using the {{taComparator}} comparator. Even 
if apps are the same priority, one will have a lower app ID, so there will 
never be 2 apps that compare equally.


> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments:

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513725#comment-15513725
 ] 

Eric Payne commented on YARN-2009:
--

[~leftnoteasy], I have some concerns about this algorithm from above:
{code}
for app in sort-by-fifo-or-priority(apps) {
   if (user-to-allocated.get(app.user) < user-limit-resource) {
app.allocated = min(app.used + pending, user-limit-resource - 
user-to-allocated.get(app.user));
user-to-allocated.get(app.user) += app.allocated;
   } else {
 // skip this app because user-limit reached
   }
}
{code}
If {{Queue1}} has 100 resources, and if {{user1}} starts {{app1}} at priority 1 
that consumes the whole queue, won't {{user1}}'s {{user-limit-resource}} be 
100? Then, if {{user1}} starts another app ({{app2}}) at priority 2, won't the 
above algorithm skip over {{app2}} because {{user1}} has already achieved its 
{{user-limit-resource}}?

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514393#comment-15514393
 ] 

Eric Payne commented on YARN-2009:
--

bq. So we could take a max(guaranteed, used). Will this be fine?
I don't think so. If {{tq.getActuallyToBePreempted}} is non-zero, it represents 
the amount that will be preempted from what {{tq}} is currently using, not 
{{tq}}'s guaranteed resources. The purpose of this line of code is to set 
{{tq}}'s unallocated resources. But even if {{tq}} is below it's guarantee, the 
amount of resources that intra-queue preemption should consider when balancing 
is not the queue's guarantee, it's what the queue is already using. If {{tq}} 
is below its guarantee, inter-queue preemption should be handling that.

bq. app1 of user1 used entire queue. app2 of user2 asks more resource
The use case I'm referencing regarding this code is not regarding 2 different 
users. It's regarding the same user submitting jobs of different priorities. If 
{{user1}} submits a low priority job that consumes the whole queue, {{user1}}'s 
headroom will be 0. Then, when {{user1}} submits a second app at a higher 
priority, this code will cause the second app to starve because {{user1}} has 
already used up its allocation.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-20 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507326#comment-15507326
 ] 

Eric Payne commented on YARN-2009:
--

[~leftnoteasy], I am confused by your above example:
{quote}
Queue's user-limit-percent = 33
Queue's used=guaranteed=max=12. 
There're 3 users (A,B,C) in the queue, order of applications are A/B/C
...
So the computed user-limit-resource will be 6.
...
The actual user-ideal-assignment when doing scheduling is 6/6/0 !
{quote}
If {{minimum-user-limit-percent == 33}}, why is the {{user-limit-resource == 
6}}?

Shouldn't {idealAssigned}} be 4/4/4? not 6/6/0?

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-20 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507832#comment-15507832
 ] 

Eric Payne commented on YARN-2009:
--

{quote}
In my above example, the #active_users is 2 instead of 3 (because B has no more 
pending resource). The reason why it uses #active-user is: existing user-limit 
is used to balance available resource to active users, it doesn't consider the 
needs to re-balance (via preemption) usages of users. To make intra-queue user 
limit preemption can correctly balance usages between users, we need to fix the 
scheduling logic as well.
{quote}
I see. I wasn't suggesting that preemption should balance all users, only those 
that are asking.
{quote}
{code}
...
for app in sort-by-fifo-or-priority(apps) {
   if (user-to-allocated.get(app.user) < user-limit-resource) {
app.allocated = min(app.used + pending, user-limit-resource - 
user-to-allocated.get(app.user));
user-to-allocated.get(app.user) += app.allocated;
...
{code}
{quote}
Yes, that would work. Thanks.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-17 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15499359#comment-15499359
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg], I'm afraid I gave you bad advice [in my comment 
above|https://issues.apache.org/jira/browse/YARN-4945?focusedCommentId=15495060=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15495060]
 regarding the fix for {{LeafQueue#getAllApplications()}}

My original suggestion was to create a new {{TreeSet}} object for {{apps}}:
{code}
Collection apps = new TreeSet(
pendingOrderingPolicy.getSchedulableEntities());
{code}
But that causes the {{SchedulingMonitor}} thread to crash with the following 
exception:
{noformat}
2016-09-17 17:07:31,156 [SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy)] ERROR 
yarn.YarnUncaughtExceptionHandler: Thread Thread[SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy),5,main] threw an Exception.
java.lang.ClassCastException: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
 cannot be cast to java.lang.Comparable
at java.util.TreeMap.compare(TreeMap.java:1290)
at java.util.TreeMap.put(TreeMap.java:538)
at java.util.TreeSet.add(TreeSet.java:255)
at java.util.AbstractCollection.addAll(AbstractCollection.java:344)
at java.util.TreeSet.addAll(TreeSet.java:312)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getAllApplications(LeafQueue.java:1859)
{noformat}
I originally suggested using {{TreeSet}} because that is what is returned by 
{{getSchedulableEntities()}}. But, since that causes an exception, I tried 
using {{HashSet}} instead. That seems to work (but I'm not sure if that's the 
best solution):
{code}
Collection apps = new HashSet(
pendingOrderingPolicy.getSchedulableEntities());
{code}

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, 
> YARN-2009.v1.patch, YARN-2009.v2.patch, YARN-2009.v3.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504876#comment-15504876
 ] 

Eric Payne commented on YARN-2009:
--

[~sunilg] / [~leftnoteasy]
I am still in the middle of reviewing the patch, but I have a couple of overall 
concerns about the design of 
{{FifoIntraQueuePreemptionPolicy#computeAppsIdealAllocation}}
- If we will be combining FIFO priority and FIFO MULP preemption, then I don't 
think {{idealAssigned}} can be calculated independently from each other:
-- I think that all apps in a queue should be grouped according to user 
{{Map}}
-- I think there should be a separate {{TAMinUserLimitPctComparator}} that 
calculates underserved users based on min user limit percent.
--- Comparator would try to balance MULP across all users like the Capacity 
Scheduler does
-- I think {{TAPriorityComparator}} should then only be given apps from the 
same user.
- Once we have {{idalAssigned}} per user, then we can divide that up among apps 
belonging to that user.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471536#comment-15471536
 ] 

Eric Payne commented on YARN-4945:
--

[~leftnoteasy] and [~sunilg], I'm still concerned about having the intra-queue 
preemption policies adding containers to the {{selectedCandidates}} list if the 
inter-queue policies have already added containers. In that case, the 
containers selected by the intra-queue policies may not go back to the correct 
queue. Consider this use case:

Queues (all are preemptable):
||Queue Name||Guaranteed Resources||Max 
Resources||{{total_preemption_per_round}}||
|root|200|200|0.1|
|QUEUE1|100|200|0.1|
|QUEUE2|100|200|0.1|

# {{User1}} starts {{App1}} on {{QUEUE1}} and uses all 200 resources. These 
containers are long-running and will not be released any time soon:
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|0|
# {{User2}} starts {{App2}} on {{QUEUE2}} and requests 100 resources:
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|0|
|QUEUE2|User2|App2|1|0|100|0|
# {{User1}} starts {{App3}} at a high priority on {{QUEUE1}} and requests 50 
resources:
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|0|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|0|100|0|
# Since {{total_preemption_per_round}} is 0.1, only 10% of the needed resources 
will be selected per round. So, the inter-queue preemption policies select 10 
resources to be preempted from {{App1}}.
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|10|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|0|100|0|
# Then, the priority-intra-queue preemption policy selects 5 more resources to 
be preempted from {{App1}}.
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|15|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|0|100|0|
# At this point, 15 resources are preempted from {{App1}}.
# Since {{QUEUE2}} is asking for 100 resources, and is extremely underserved 
(from an inter-queue point of view), the capacity scheduler gives all 15 
resources to {{QUEUE2}}, and the priority inversion remains in {{QUEUE1}}.
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|185|15|0|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|15|85|0|

This is why I am concerned that when containers are already selected by the 
inter-queue preemption policies, it may not be beneficial to have the 
intra-queue policies preempt containers as well.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-09-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471662#comment-15471662
 ] 

Eric Payne commented on YARN-:
--

Any objections if I backport this to branch-2.8?

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-08-26 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439148#comment-15439148
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg] and [~leftnoteasy], While I was writing the use cases, I thought 
about the interaction between the inter-queue and intra-queue preemption 
policies, and I want to get your thoughts. The following use case is not in the 
doc, because I'm not sure how to handle it.

Consider this use case:
The cluster has 2 queues, both preemptable, both have configured max capacity 
that can use the whole cluster.
||Queue Name||Queue Guaranteed Resources||Queue Max Resources||Queue 
{{minimum-user-limit-percent}}||Queue Preemptable||
|root|200|200|N/A|N/A|
|{{QUEUE1}}|100|200|100%|Yes|
|{{QUEUE2}}|100|200|50%|Yes|
---
||Queue Name||User Name||App Name||Resources Used||Resources Guaranteed Per 
User by
{{minimum-user-limit-percent}}||Pending Resources||
|{{QUEUE1}}|{{User1}}|{{App1}}|120|100|0|
|{{QUEUE2}}|{{User2}}|{{App2}}|80|50|0|
|{{QUEUE2}}|{{User3}}|{{App3}}|0|50|20|

# The inter-queue preemption policy sees that {{QUEUE2}} is underserved and is 
asking for 20 resources, and {{QUEUE1}} is over-served by 20 resources, so it 
preempts 20 resources from {{App1}}.
# The intra-queue preemption policy sees that {{User3}} is under its 
{{minimum-user-limit-percent}} and is aksing for 20 resources, and {{User2}} is 
over its {{minimum-user-limit-percent}}, so the intra-queue-preemption policy 
preempts 20 resources from {{App2}}.
# The result of this scenario is that 20 resources are preempted when they 
should not be.

The scenario I have laid out above assumes that intra-queue preemption did not 
know about the 20 containers that are already preempted to fulfill the needs of 
{{App3}} in {{QUEUE2}}. I think that the design doc tries to address this, and 
assumes that the intra-queue preemption policy will be able to handle this use 
case and will not preempt more containers when it is not necessary.

However, I am not so sure about that. In a more complicated scenario with 
multiple over-served and multiple under-served queues, how will the intra-queue 
preemption policy know that the containers that are already in the 
{{selectedContainers}} list will be used to fulfill the needs of any specific 
queue?

Please provide your thoughts.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-08-23 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-:
-
Attachment: PctOfQueueIsInaccurate.jpg

The queue structure for the attached screenshot (PctOfQueueIsInnaccurate.jpg) 
has the following attributes:
||Cluster Capacity||root.swords.capacity||root.swords.brisingr.capacity||
|12288 MB|20%|25%|

There are 3 apps running in the {{root.swords.brisingr}} queue. The attributes 
for each of these apps are as follows:
||App Name||Allocated Memory MB||% of Queue||
|application_1471969002932_0001|4608 MB|150.0|
|application_1471969002932_0002|4608 MB|150.0|
|application_1471969002932_0003|3072 MB|100.0|

The value to the right of the {{Queue: swords.brisingr}} bar graph says that 
the queue is 2001.3% used. This value is (almost) accurate because the actual 
memory allocation allotted to {{root.swords.brisingr}} is {{12288 MB * 20% * 
25% = 614.4 MB}}. Since {{root.swords.brisingr}} is consuming all 12288 MB, 
{{12288 MB / 614.4 MB = 20 * 100% = 2000%}}

However, the sum of the {{% of Queue}} column for all apps running in 
{{root.swords.brisingr}} is {{100.0% + 150.0% + 150.0% = 400%}}. This is 
inaccurate.

It appears as if the calculations are not taking into account the capacity of 
the parent queue, {{root.swords: 20%}}. For 
example,{{application_1471969002932_0001}}'s usage is 4608 MB, and {{12288 MB * 
25% = 3072 MB}}, and {{4608 / 3072 = 1.5 * 100% = 150%}}. This calculation 
should have been {{4608 / 614.4 = 7.5 * 100% = 750%}}.

{{RMAppsBlock#renderData}} is calling {{ApplicationResourceUsageReport}}, which 
eventually calls {{SchedulerApplicationAttempt#getResourceUsageReport}}.
The following code in {{getResourceUsageReport}}, I think, needs to walk back 
up the parent tree to get all of the capacity values, not just the one for the 
leaf queue:
{code}
  queueUsagePerc =
  calc.divide(cluster, usedResourceClone, Resources.multiply(cluster,
  queue.getQueueInfo(false, false).getCapacity())) * 100;
{code}

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: PctOfQueueIsInaccurate.jpg
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-08-23 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-:
-
Assignee: (was: Eric Payne)

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Priority: Minor
> Attachments: PctOfQueueIsInaccurate.jpg
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-08-23 Thread Eric Payne (JIRA)

Eric Payne created YARN-:


 Summary: Scheduler UI: "% of Queue" is inaccurate if leaf queue is 
hierarchically nested.
 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, {{root.a.a2}}), 
the values in the "*% of Queue*" column in the apps section of the Scheduler UI 
is calculated as if the leaf queue ({{a1}}) were a direct child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-08-25 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4945:
-
Attachment: Intra-Queue Preemption Use Cases.pdf

[~sunilg] and [~leftnoteasy],

I am attaching the set of use cases that I could think of for in-queue 
preemption. I will include the base use-case statements in this comment, but 
please do look at the document for details, examples, and open issues for each 
use case.


1.  Ensure each user in a queue is guaranteed its appropriate 
minimum-user-limit-percent
1.1.When one (or more) user(s) are below their minimun-user-limit-percent 
and one (or more) user(s) are above their minimum-user-limit-percent, resources 
will be preempted after a configurable time period from the user(s) which are 
above their minimum-user-limit-percent.
1.2.When two (or more) users are below their minimum-user-limit-factor, 
neither will be preempted in favor of the other.
1.3.If all users in a queue are at or over their 
minimum-user-limit-percent, the user-limit-percent-preemption policy will not 
preempt resources.

2.  Ensure priority inversion doesn’t occur between applications.
2.1.When a lower priority app is consuming long-running resources, a higher 
priority app is requesting resources, and the queue cannot grow to accommodate 
the higher priority app’s request, the priority-intra-queue-preemption policy 
will preempt resources from the lower priority app, after a configurable period 
of time.

3.  Interaction between the priority and minimum-user-limit-percent 
preemption policies.
3.1.If priority inversion occurs between apps owned by different users, the 
priority preemption policy will not preempt containers from the lower priority 
app if it would cause the lower priority app to go below the user’s 
minimum-user-limit-percent guarantee.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-03 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543523#comment-15543523
 ] 

Eric Payne commented on YARN-2009:
--

Hi [~sunilg],

I noticed some odd behavior while trying the following use case:
# {{User1}} starts {{app1}} at {{priority1}} and consumes the entire queue
# {{User1}} starts {{app2}} at {{priority2}}
# preemption happens for all containers from {{app1}} except the AM container
# {{app2}} consumes all containers released by {{app1}}
# The preemption monitor preempts containers from {{_app2_}}! This continues as 
long as {{app2}} runs.

I believe it is caused by the following code:
{code:title=FifoIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue}
Collection appsOrderedByPriority = tq.getApps();
Resource actualPreemptNeeded = null;

for (TempAppPerPartition a1 : appsOrderedByPriority) {
  for (String label : a1.getPartitions()) {

// Updating pending resource per-partition level.
if ((actualPreemptNeeded = resToObtainByPartition.get(label)) == null) {
  actualPreemptNeeded = Resources.createResource(0, 0);
  resToObtainByPartition.put(label, actualPreemptNeeded);
}
Resources.addTo(actualPreemptNeeded, a1.getActuallyToBePreempted());
  }
}
return resToObtainByPartition;
{code}

Since {{app1}}'s AM container is still running, the size of 
{{actuallyToBePreempted}} for {{app1}} is the size of the AM's container. This 
gets added to {{actualPreemptNeeded}} and put into {{actualPreemptNeeded}}, 
which then gets passed to 
{{IntraQueueCandidatesSelector#preemptFromLeastStarvedApp}}. 
{{preemptFromLeastStarvedApp}} skips {{app1}}'s AM, and then preempts from the 
only remaining thing with resources, which is {{app2}}.

I'm not sure exactly how I would fix this yet, except to consider the size of 
the AM when calculating {{actuallyToBePreempted}}.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-09-24 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520009#comment-15520009
 ] 

Eric Payne commented on YARN-2009:
--

Hi [~sunilg]. After thinking more about it, I think I would like to express my 
ideas about {{tq.unassigned}} in the following algorithm:
{code}
tq.unassigned = tq.used
while tq.unassigned
for app : underservedApps
app.idealAssigned = app.used + app.pending // considering, of course, 
user-resource-limit, as Wangda defined it above
tq.unassigned -= app.idealAssigned
{code}
My concern is that if 1) {{tq.guaranteed}} is used in the above algorithm 
instead of {{tq.used}}, and 2) if {{tq.used}} is less than {{tq.guaranteed}}, 
then the above algorithm will want to ideally assign more total resources to 
all apps than are being used. If that happens, then when it comes time for the 
intra-queue preemption policy to preempt resources, it seems to me that the 
policy won't preempt enough resources.

It seems tome that the intra-queue preemption policy should only be considering 
actually being used resources when deciding how much to preempt, not guaranteed 
resources.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-26 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609030#comment-15609030
 ] 

Eric Payne commented on YARN-2009:
--

[~sunilg], I am still analyzing the code, but I do want to point out one 
concern that I have.

{{TestProportionalCapacityPreemptionPolicyIntraQueue#testPreemptionWithTwoUsers}}:
 In this test, the user limit resource is 30, and  {{app3}} has 40 resources. 
So, at most, I would expect only 10 resources to be preempted from {{app3}}, 
which would bring {{app3}} down to 30 resources to match the ULR. Instead, 25 
resources were preempted. I feel that 
{{FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp}} should 
not allow {{tmpApp.idealAssigned}} to be lower than the ULR.

One more thing I noticed during my manual testing is that although the 
unnecessary preemption is a lot less, there are cases where it still happens. I 
haven't drilled down yet.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch, 
> YARN-2009.0012.patch, YARN-2009.0013.patch, YARN-2009.0014.patch, 
> YARN-2009.0015.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5725) Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5725:
-
Component/s: nodemanager

> Test uncaught exception in 
> TestContainersMonitorResourceChange.testContainersResourceChange when setting 
> IP and host
> 
>
> Key: YARN-5725
> URL: https://issues.apache.org/jira/browse/YARN-5725
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-5725.000.patch, YARN-5725.001.patch, 
> YARN-5725.002.patch, YARN-5725.003.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The issue is a warning but it prevents container monitor to continue
> 2016-10-12 14:38:23,280 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(594)) - 
> Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_123456_0001_01_01
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:455)
> 2016-10-12 14:38:23,281 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(613)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5311) Document graceful decommission CLI and usage

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5311:
-
Labels: oct16-easy  (was: )

> Document graceful decommission CLI and usage
> 
>
> Key: YARN-5311
> URL: https://issues.apache.org/jira/browse/YARN-5311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.9.0
>Reporter: Junping Du
>  Labels: oct16-easy
> Attachments: YARN-5311.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5271:
-
Labels: oct16-medium  (was: )

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Attachments: YARN-5271.01.patch, YARN-5271.branch-2.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4218) Metric for resource*time that was preempted

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4218:
-
Component/s: resourcemanager

> Metric for resource*time that was preempted
> ---
>
> Key: YARN-4218
> URL: https://issues.apache.org/jira/browse/YARN-4218
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, 
> YARN-4218.2.patch, YARN-4218.3.patch, YARN-4218.patch, YARN-4218.wip.patch, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> After YARN-415 we have the ability to track the resource*time footprint of a 
> job and preemption metrics shows how many containers were preempted on a job. 
> However we don't have a metric showing the resource*time footprint cost of 
> preemption. In other words, we know how many containers were preempted but we 
> don't have a good measure of how much work was lost as a result of preemption.
> We should add this metric so we can analyze how much work preemption is 
> costing on a grid and better track which jobs were heavily impacted by it. A 
> job that has 100 containers preempted that only lasted a minute each and were 
> very small is going to be less impacted than a job that only lost a single 
> container but that container was huge and had been running for 3 days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5773:
-
Component/s: rolling upgrade
 capacity scheduler

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, rolling upgrade
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch, 
> YARN-5773.0004.patch, YARN-5773.0005.patch, YARN-5773.0006.patch, 
> YARN-5773.003.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5369) Improve Yarn logs command to get container logs based on Node Id

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5369:
-
 Labels: oct16-medium  (was: )
Component/s: yarn
 client

> Improve Yarn logs command to get container logs based on Node Id
> 
>
> Key: YARN-5369
> URL: https://issues.apache.org/jira/browse/YARN-5369
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>  Labels: oct16-medium
> Attachments: YARN-5369.1.patch, YARN-5369.2.patch, YARN-5369.3.patch, 
> YARN-5369.4.patch, YARN-5369.5.patch
>
>
> It is helpful if we could have yarn logs --applicationId appId --nodeAddress 
> ${nodeId} to get all the container logs which ran on the specific nm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5725) Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5725:
-
Labels: oct16-easy  (was: )

> Test uncaught exception in 
> TestContainersMonitorResourceChange.testContainersResourceChange when setting 
> IP and host
> 
>
> Key: YARN-5725
> URL: https://issues.apache.org/jira/browse/YARN-5725
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-5725.000.patch, YARN-5725.001.patch, 
> YARN-5725.002.patch, YARN-5725.003.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The issue is a warning but it prevents container monitor to continue
> 2016-10-12 14:38:23,280 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(594)) - 
> Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_123456_0001_01_01
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:455)
> 2016-10-12 14:38:23,281 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(613)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5773:
-
Labels: oct16-medium  (was: )

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, rolling upgrade
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch, 
> YARN-5773.0004.patch, YARN-5773.0005.patch, YARN-5773.0006.patch, 
> YARN-5773.003.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5706) Fail to launch SLSRunner due to NPE

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5706:
-
 Labels: oct16-easy  (was: )
Component/s: scheduler-load-simulator

> Fail to launch SLSRunner due to NPE
> ---
>
> Key: YARN-5706
> URL: https://issues.apache.org/jira/browse/YARN-5706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0-alpha2
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>  Labels: oct16-easy
> Attachments: YARN-5706.01.patch, YARN-5706.02.patch
>
>
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:88)
>   at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(SLSCapacityScheduler.java:459)
>   at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.setConf(SLSCapacityScheduler.java:153)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> {code}
> CLASSPATH for html resource is not configured properly.
> {code}
> DEBUG: Injecting share/hadoop/tools/sls/html into CLASSPATH
> DEBUG: Rejected CLASSPATH: share/hadoop/tools/sls/html (does not exist)
> {code}
> This issue can be reproduced when doing according to the documentation 
> instruction.
> http://hadoop.apache.org/docs/current/hadoop-sls/SchedulerLoadSimulator.html
> {code}
> $ cd $HADOOP_ROOT/share/hadoop/tools/sls
> $ bin/slsrun.sh
>   --input-rumen |--input-sls=
>   --output-dir= [--nodes=]
> [--track-jobs=] [--print-simulation]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5554:
-
Labels: oct16-medium  (was: )

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, 
> YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5435) [Regression] QueueCapacities not being updated for dynamic ReservationQueue

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5435:
-
Labels: oct16-easy regression  (was: regression)

> [Regression] QueueCapacities not being updated for dynamic ReservationQueue
> ---
>
> Key: YARN-5435
> URL: https://issues.apache.org/jira/browse/YARN-5435
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sean Po
>Assignee: Sean Po
>  Labels: oct16-easy, regression
> Attachments: YARN-5435.v1.patch, YARN-5435.v2.patch
>
>
> YARN-1707 added dynamic queues (ReservationQueue) to CapacityScheduler. The 
> QueueCapacities data structure was added subsequently but is not being 
> updated correctly for ReservationQueue. This JIRA tracks the changes required 
> to update QueueCapacities of ReservationQueue correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5602) Utils for Federation State and Policy Store

2016-10-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5602:
-
Labels: oct16-medium  (was: )

> Utils for Federation State and Policy Store
> ---
>
> Key: YARN-5602
> URL: https://issues.apache.org/jira/browse/YARN-5602
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>  Labels: oct16-medium
> Attachments: YARN-5602-YARN-2915.v1.patch, 
> YARN-5602-YARN-2915.v2.patch
>
>
> This JIRA tracks the creation of utils for Federation State and Policy Store 
> such as Error Codes, Exceptions...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Intra-queue preemption for app priority support ProportionalCapacityPreemptionPolicy

2016-10-27 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613181#comment-15613181
 ] 

Eric Payne commented on YARN-2009:
--

The latest patch LGTM, as long as we commit it with the understanding that 
intra-queue preemption can only be enabled if no preemptable queue in the 
cluster are multi-tenant.

> Intra-queue preemption for app priority support 
> ProportionalCapacityPreemptionPolicy
> 
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
>  Labels: oct16-medium
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch, 
> YARN-2009.0012.patch, YARN-2009.0013.patch, YARN-2009.0014.patch, 
> YARN-2009.0015.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5602) Utils for Federation State and Policy Store

2016-10-27 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613363#comment-15613363
 ] 

Eric Payne commented on YARN-5602:
--

Hi [~giovanni.fumarola]. Thanks for working on the fix for this issue. Just 
FYI, this patch no longer applies on trunk.

> Utils for Federation State and Policy Store
> ---
>
> Key: YARN-5602
> URL: https://issues.apache.org/jira/browse/YARN-5602
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>  Labels: oct16-medium
> Attachments: YARN-5602-YARN-2915.v1.patch, 
> YARN-5602-YARN-2915.v2.patch
>
>
> This JIRA tracks the creation of utils for Federation State and Policy Store 
> such as Error Codes, Exceptions...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted

2016-10-28 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616696#comment-15616696
 ] 

Eric Payne commented on YARN-4218:
--

[~lichangleo], Thank you very much for adding this useful metric. I'm sorry 
that it took so long to review this patch.

The patch looks good to me. Please upmerge it and I will commit it to trunk.

Also, this feature would be good to have in branch-2 and branch-2.8. Can you 
please also provide a patch for those branches?

> Metric for resource*time that was preempted
> ---
>
> Key: YARN-4218
> URL: https://issues.apache.org/jira/browse/YARN-4218
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, 
> YARN-4218.2.patch, YARN-4218.3.patch, YARN-4218.patch, YARN-4218.wip.patch, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> After YARN-415 we have the ability to track the resource*time footprint of a 
> job and preemption metrics shows how many containers were preempted on a job. 
> However we don't have a metric showing the resource*time footprint cost of 
> preemption. In other words, we know how many containers were preempted but we 
> don't have a good measure of how much work was lost as a result of preemption.
> We should add this metric so we can analyze how much work preemption is 
> costing on a grid and better track which jobs were heavily impacted by it. A 
> job that has 100 containers preempted that only lasted a minute each and were 
> very small is going to be less impacted than a job that only lost a single 
> container but that container was huge and had been running for 3 days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted

2016-11-10 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655289#comment-15655289
 ] 

Eric Payne commented on YARN-4218:
--

+1

Thanks [~lichangleo] for the patches and the work done on this JIRA. I will 
commit this.

> Metric for resource*time that was preempted
> ---
>
> Key: YARN-4218
> URL: https://issues.apache.org/jira/browse/YARN-4218
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4218-branch-2.003.patch, YARN-4218.006.patch, 
> YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, 
> YARN-4218.3.patch, YARN-4218.4.patch, YARN-4218.5.patch, 
> YARN-4218.branch-2.2.patch, YARN-4218.branch-2.patch, YARN-4218.patch, 
> YARN-4218.trunk.2.patch, YARN-4218.trunk.3.patch, YARN-4218.trunk.patch, 
> YARN-4218.wip.patch, screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> After YARN-415 we have the ability to track the resource*time footprint of a 
> job and preemption metrics shows how many containers were preempted on a job. 
> However we don't have a metric showing the resource*time footprint cost of 
> preemption. In other words, we know how many containers were preempted but we 
> don't have a good measure of how much work was lost as a result of preemption.
> We should add this metric so we can analyze how much work preemption is 
> costing on a grid and better track which jobs were heavily impacted by it. A 
> job that has 100 containers preempted that only lasted a minute each and were 
> very small is going to be less impacted than a job that only lost a single 
> container but that container was huge and had been running for 3 days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589980#comment-15589980
 ] 

Eric Payne commented on YARN-2009:
--

[~sunilg], thanks for the updated patch.

There is a problem with preemption when multiple users are in the same queue. 
Consider the following use case:
- {{user1}} starts {{app1}} at priority 1 and consumes the entire queue with 
long-running containers. {{app1}} has many pending resources.
- {{user2}} starts {{app2}} at priority 2. {{app2}} has many pending resources.
- Intra-queue preemption monitor preempts containers from {{app1}} until both 
{{app1}} and {{app2}} have equal resources.

However, the the intra-queue preemption monitor doesn't stop there. It 
continues to preempt containers from {{app1}}, which are given back to {{app1}} 
by the scheduler.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-14 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576735#comment-15576735
 ] 

Eric Payne commented on YARN-2009:
--

Hi [~sunilg]. I think I would suggest to do the following:
{code:title=current 
FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp}
  Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed());
{code}

{code:title=suggested 
FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp}
  if (Resources.lessThan(rc, clusterResource,
Resources.subtract(tmpApp.getUsed(), preemtableFromApp),
tmpApp.getAMUsed())) {
Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed());
  }
{code}

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-20 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592740#comment-15592740
 ] 

Eric Payne commented on YARN-2009:
--

Hi [~sunilg]. Here is a description of my test environment, the steps I 
executed, and the results I am seeing

I don't know why the unit test you described above is not catching this, but I 
will continue to investigate. In the meantime, can you please try the following 
and let me know what you discover?

-

||Property Name||Property Value||
|monitoring_interval (ms)|1000|
|max_wait_before_kill (ms)|500|
|total_preemption_per_round|1.0|
|max_ignored_over_capacity|0.2|
|select_based_on_reserved_containers|true|
|natural_termination_factor|2.0|
|intra-queue-preemption.enabled|true|
|intra-queue-preemption.minimum-threshold|0.5|
|intra-queue-preemption.max-allowable-limit|0.1|

{noformat:title=Cluster}
Nodes: 3
Mem per node: 4 GB
Total Cluster Size: 12 GB
Container size: 0.5 GB
{noformat}

||Queue||Guarantee||Max||Minimum user limit percent||User Limit Factor||
|root|100% (12 GB)|100% (12 GB)|N/A|N/A|
|default|50% (6 GB)|100% (12 GB)|50% (2 users can run in queue 
simultaneously)|2.0 (one user can consume twice the queue's Guarantee|
|eng|50% (6 GB)|100% (12 GB)|50% (2 users can run in queue simultaneously)|2.0 
(one user can consume twice the queue's Guarantee|

- {{user1}} starts {{app1}} at priority 1 in the {{default}} queue, and 
requests 30 mappers which want to run for 10 minutes each:
-- Sleep job: {{-m 30 -mt 60}}
-- Total requested resources are 15.5 GB: ((30 map containers * 0.5 GB per 
container) + 0.5 GB AM container))
||App Name||User Name||Priority||Used||Pending||
|app1|user1|1|0|15.5 GB|
- The RM assigns {{app1}} 24 containers, consuming 12 GB (all cluster 
resources):
-- {{(23 mappers * 0.5 GB) + 0.5 GB AM = 12 GB}}
||App Name||User Name||Priority||Used||Pending||
|app1|user1|1|12 GB|3.5 GB|
- {{user2}} starts {{app2}} at priority 2 in the {{default}} queue, and 
requests 30 mappers which want to run for 10 minutes each:
||App Name||User Name||Priority||Used||Pending||
|app1|user1|1|12 GB|3.5 GB|
|app2|user2|2|0|15.5 GB|
- The intra-queue preemption monitor iterates over the containers for several 
{{monitoring_interval}}'s and preempts 12 containers (6 GB resources)
- The RM assigns the the preempted containers to {{app2}}
||App Name||User Name||Priority||Used||Pending||
|app1|user1|1|6 GB|3.5 GB|
|app2|user2|2|6 GB|3.5 GB|
- The intra-queue preemption monitor continues to preempt containers from 
{{app1}}.
-- However, since the MULP for the {{default}} queue should be 6GB, the RM 
gives the preempted containers back to {{app1}}
-- This repeats indefinitely.


> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-20 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593001#comment-15593001
 ] 

Eric Payne commented on YARN-2009:
--

Hi [~sunilg]. I am confused by something you said in the [comment 
above|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15591597=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15591597]:
{quote}
I tested below case
{code}
...
"b\t" // app3 in b
+ "(4,1,n1,,40,false,20,_user1_);" + // app3 b
"b\t" // app1 in a
+ "(6,1,n1,,5,false,30,_user2_)";
...
{code}
{quote}
I assumed that the above was from a unit test. As far as I can tell, nothing in 
the {{o.a.h.y.s.r.monitor.capacity}} framework supports testing with different 
users. Were you using the above code as pseudocode to document a manual test?

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-21 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594971#comment-15594971
 ] 

Eric Payne commented on YARN-2009:
--

bq. the test code which i posted was from my unit test
Thanks [~sunilg]. I cannot find anywhere in the capacity preemption tests where 
it takes different user IDs as parameters. I don't see that in the 
YARN-2009.0011.patch either. Can you please help me understand what I'm missing?

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-17 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583579#comment-15583579
 ] 

Eric Payne commented on YARN-2009:
--

Thanks for the updated patch [~sunilg].

Should this feature work with labelled queues? I notice that it does not. I 
think it is because the following code is getting the used capacity only for 
the default partition:
{code:title=IntraQueueCandidatesSelector#computeIntraQueuePreemptionDemand}
if (leafQueue.getUsedCapacity() < context
.getMinimumThresholdForIntraQueuePreemption()) {
  continue;
}
{code}
The above code has access to the partition, so it should be easy to get the 
used capacity per partition. Perhaps something like the following:
{code:title=IntraQueueCandidatesSelector#computeIntraQueuePreemptionDemand}
if (leafQueue.getQueueCapacities().getUsedCapacity(partition) < context
.getMinimumThresholdForIntraQueuePreemption()) {
  continue;
}
{code}

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2016-11-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687948#comment-15687948
 ] 

Eric Payne commented on YARN-5889:
--

Thanks [~sunilg] for working on this refactoring. Here are my comments for 
{{YARN-5889.v1.patch}}


- CapacitySchedulerConfiguration.java: 
-- {{getUserLimitAynschronously}} should be {{getUserLimitAsynchronously}}

- {{CapacitySchedule#serviceStart}}: Shouldn't this check before dereferencing 
{{computeUserLimitAsyncThread}}:
{code}
   public void serviceStart() throws Exception {
 startSchedulerThreads();
computeUserLimitAsyncThread.start();
 super.serviceStart();
   }
{code}

- {{CapacitySchedule#ComputeUserLimitAsyncThread#run}}:
{code}
Thread.sleep(1);
{code}
It seems like this should be longer than 1 ms. Isn't the default scheduling 
interval 5 seconds? That may be too long, but I think it should be at least a 
second.

- {{CapacitySchedule#ComputeUserLimitAsyncThread#run}}:
This is just a very small nit, but it seems to me like {{getAllLeafQueues()}} 
should return a list of {{LeafQueue}}'s instead of a list of {{CSQueue}}'s.
{code}
List leafQueues = cs.getAllLeafQueues();
{code}

- LeafQueue.java:
Another tiny nit, but since {{computeUserLimit}} and {{getComputedUserLimit}} 
have the same arguments, can the arguments to {{getComputedUserLimit}} be in 
the same order as those for {{computeUserLimit}}?

- {{LeafQueue#getComputedUserLimit}}
I don't think we want to always return {{Resources.unbounded()}} when 
{{userLimitPerSchedulingMode}} is null. If computing user limit the legacy way, 
the return value of the {{computeUserLimit}} method should be returned.

- ActiveUsersManager.java: Is the import of FiCaSchedulerApp needed?
{code}
import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp;
{code}


> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4822) Refactor existing Preemption Policy of CS for easier adding new approach to select preemption candidates

2016-11-21 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4822:
-
Fix Version/s: 2.8.0

Thanks [~leftnoteasy]. I also backported this to branch-2.8

> Refactor existing Preemption Policy of CS for easier adding new approach to 
> select preemption candidates
> 
>
> Key: YARN-4822
> URL: https://issues.apache.org/jira/browse/YARN-4822
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4822.1.patch, YARN-4822.2.patch, YARN-4822.3.patch, 
> YARN-4822.4.patch, YARN-4822.5.patch, YARN-4822.6.patch, YARN-4822.7.patch
>
>
> Currently, ProportionalCapacityPreemptionPolicy has hard coded logic to 
> select candidates to be preempted (based on FIFO order of 
> applications/containers). It's not a simple to add new candidate-selection 
> logics, such as preemption for large container, intra-queeu fairness/policy, 
> etc.
> In this JIRA, I propose to do following changes:
> 1) Cleanup code bases, consolidate current logic into 3 stages:
> - Compute ideal sharing of queues
> - Select to-be-preempt candidates
> - Send preemption/kill events to scheduler
> 2) Add a new interface: {{PreemptionCandidatesSelectionPolicy}} for above 
> "select to-be-preempt candidates" part. Move existing how to select 
> candidates logics to {{FifoPreemptionCandidatesSelectionPolicy}}. 
> 3) Allow multiple PreemptionCandidatesSelectionPolicies work together in a 
> chain. Preceding PreemptionCandidatesSelectionPolicy has higher priority to 
> select candidates, and later PreemptionCandidatesSelectionPolicy can make 
> decisions according to already selected candidates and pre-computed queue 
> ideal shares of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Moved] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2016-11-16 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne moved MAPREDUCE-6812 to YARN-5892:
-

Component/s: (was: yarn)
 (was: capacity-sched)
 yarn
 capacity scheduler
Key: YARN-5892  (was: MAPREDUCE-6812)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, yarn
>Reporter: Eric Payne
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}
> NOTE: This should be implemented in a way that user-limit-percent-intra-queue 
> preemption (YARN-2113) should not be affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2016-11-16 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-5892:
-
Component/s: (was: capacity scheduler)
 (was: yarn)
 capacityscheduler

> Capacity Scheduler: Support user-specific minimum user limit percent
> 
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}
> NOTE: This should be implemented in a way that user-limit-percent-intra-queue 
> preemption (YARN-2113) should not be affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1497 matches

Mail list logo