[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2018-09-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622518#comment-16622518
 ] 

Hadoop QA commented on YARN-5215:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-5215 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5215 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835650/YARN-5215.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21913/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch, 
> YARN-5215.002.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2017-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196598#comment-16196598
 ] 

Hadoop QA commented on YARN-5215:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-5215 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5215 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835650/YARN-5215.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/17819/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>  Labels: oct16-hard
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch, 
> YARN-5215.002.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613360#comment-15613360
 ] 

Hadoop QA commented on YARN-5215:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-server-tests in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 21s{color} | 
{color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 21s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 8 new + 355 unchanged - 2 fixed = 363 total (was 357) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
22s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
17s{color} | {color:red} hadoop-yarn-server-tests in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
47s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
12s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-server-tests in 

[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612988#comment-15612988
 ] 

Hadoop QA commented on YARN-5215:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-5215 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5215 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12809098/YARN-5215.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13573/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Inigo Goiri
>  Labels: oct16-hard
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-07-06 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365512#comment-15365512
 ] 

Inigo Goiri commented on YARN-5215:
---

[~kasha], for the node utilization, in Windows it's pretty fast and I haven't 
found any issues.
My guess is that in Linux it should be pretty fast too as it's checking single 
values in /proc and not going through the whole tree.

Regarding disk and network, the node monitoring is already in trunk for both 
Windows and Linux.
The utilization it's not sent to the RM but I have that pending in YARN-2965. I 
can push for that this week.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-07-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365506#comment-15365506
 ] 

Karthik Kambatla commented on YARN-5215:


YARN-1011 and I assume YARN-5202 primarily target using those resources that 
have been allocated to other containers but not used. I see the value in 
extending this to all unused resources on the node, especially if we can 
release resources immediately in case of resource contention.

My concern is with aggressively scheduling non-YARN resources *without* 
immediate preemption in case of resource contention. It might also be nice to 
have a way for other (white-listed) processes to actively reclaim resources 
from YARN. May be, the preemption code could be shared between this and 
YARN-1011? 

[~elgoiri] - do you know how long it takes to compute node utilization and if 
there is need to improve that too? 

If we look only at cpu and memory utilization, may be we could oversubscribe on 
disk/network. Any chance we could get the node-level utilization for 
disk/network from Tetris work? [~asuresh], [~srikanthkandula]? 

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-07-06 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365259#comment-15365259
 ] 

Inigo Goiri commented on YARN-5215:
---

In our internal deployment, we always reserve a buffer for the external load to 
spike. This is set by tuning the available cores and memory.

[~jlowe], as you mention, we internally have preemption at both RM and NM 
level. We only enable the one at NM level as it's the one with the best latency 
and we don't have a need for the RM level one. As I mention in a previous 
comment, this patch it's just to do scheduling in the RM, if we want to go with 
the full solution, we would need:
* Schedule containers considering external load in the RM
* Expose external load in the UI
* Use history to smooth external load
* Preempting containers from the RM based on external load
* Preempting containers from the NM based on external load

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-07-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365070#comment-15365070
 ] 

Jason Lowe commented on YARN-5215:
--

Maybe I'm missing something, but any of the proposed approaches has YARN 
assuming it can leverage the unused resources on the node.  That's sort of the 
whole point, we want YARN to use those unused resources rather than just 
hard-partitioning the node between YARN and the other system.  Some of the 
approaches start with the assumption that the whole node belongs to YARN and 
YARN will scale back usage of the node based on utilization feedback, while 
other approaches start with YARN assuming it has a smaller portion of the node 
and can reach beyond it when utilization is low.  It's the same scenario from 
two perspectives.

IIUC any of these approaches can react relatively quickly to the other 
workload's demands by having the nodemanager take action directly (by 
preempting containers) when the periodically monitored node utilization goes 
above some configured limit.  The original proposal in this JIRA doesn't do 
that, which means it won't be super-responsive to the other subsystem.   The RM 
won't allocate any additional containers when the utilization gets high, but 
some of the containers would have to exit on their own before YARN's existing 
utilization would decrease.  It sounds like the version Inigo has deployed in 
production does do some sort of preemption, but it sounded like it was coming 
from the RM rather than the NM which would be slightly slower response time 
than if the NM did it directly.

If the latency demands of the other workload are so severe that it's impossible 
for YARN to react quickly enough then I don't see how YARN can leverage those 
resources when they are unused.  We'd have to resort to some kind of 
hard-partitioning (either giving the nodemanager less resources than the node 
actually has or using proxy containers in YARN on behalf of the other workload 
to reserve the resources) and live with the underutilization of those resources 
when the other workload is idle.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-07-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364765#comment-15364765
 ] 

Karthik Kambatla commented on YARN-5215:


Spoke to Inigo offline at the summit. 

My primary concern is with assuming unused resources on the node can be used by 
YARN. It is not uncommon for users to be running something else besides YARN on 
the worker nodes. While these external processes might not be using any 
resources at the time, they might be high-priority workloads that need to be 
able to use those resources immediately.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332656#comment-15332656
 ] 

Carlo Curino commented on YARN-5215:


[~nroberts] I completely agree about meeting in person at Summit. There are a 
couple of other hot topics like Federation YARN-2915 or OPPORTUNISTIC Container 
placement YARN-5220. 
I think spending some time chatting in person (and then reporting in JIRA) 
would be great way to converge quickly. 

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-15 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332601#comment-15332601
 ] 

Nathan Roberts commented on YARN-5215:
--

Thanks [~elgoiri] for the work. Maybe Summit would be a good time to get 
interested parties together to settle on a direction?

I do see this being very similar to what YARN-5202 is doing. In fact I think if 
we just removed the lower bounds in YARN-5202 (i.e. allow it to go below a 
node's declared resource), it would effectively accomplish the same thing. e.g. 
if a memory hungry process starts up on a node, node utilization will increase 
beyond the desired thresholds and the node's resource available for scheduling 
will be reduced. In my mind,  we should basically set  a utilization target and 
then have schedulerNode adjust the node's resource either up or down depending 
on where we are in relation to the target. The inputs used to decide if and by 
how-much a node's resource should be adjusted, is where I think it's 
interesting.

Regarding the patch. At least on Linux I think we have to be careful about 
aggregating all of the container utilizations together. A simple example where 
I think this might not do the right thing is a large MR job that is looking up 
data in a large mmap'ed lookup table. RSS as calculated via /proc//stat 
does not understand shared pages (afaik). This means we'll be double-counting 
this mmap'ed file for every container running on the node. We're frequently 
running 50+ containers on a node so if this job has lots of tasks running on a 
node, we'd have 10's of GB of error.  I know we keep it from going negative 
which is impportant, but in this case we'll underestimate the amount of 
external resource running on the node. 
{noformat}
+  externalUtilization = ResourceUtilization.newInstance(nodeUtilization);
+  externalUtilization.subtractFrom(
+  containersUtilization.getPhysicalMemory(),
+  containersUtilization.getVirtualMemory(),
+  containersUtilization.getCPU());
{noformat}


> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-15 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332483#comment-15332483
 ] 

Inigo Goiri commented on YARN-5215:
---

[~kkaranasos] I kind of like the "fake container" approach. In addition, that 
would expose the real status of the cluster to the users and I think it would 
cover [~jlowe]'s concern. For distinguishing fake containers, we could make 
them be part of a fake application and replace the framework in the Application 
Type to something like EXTERNAL and the actual external load as the application 
name. We should leverage the unmanaged AM concept.

I also like this approach because we can create a default service that does the 
same thing I'm proposing in this patch and just update the size of the 
container based on node and containers utilization.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-15 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332268#comment-15332268
 ] 

Konstantinos Karanasos commented on YARN-5215:
--

I also think this is a good feature to have -- thanks for initiating this, 
[~elgoiri]...

We had some similar use cases, so we had gone over possible designs some time 
back.
I see some advantages in the "fake container" approach that [~kasha] mentioned.
What I like about it is that you do not have to introduce the "external 
resources" to the RM. So essentially everything happens at the NM level, and 
the RM sees just some extra container.
The disadvantage I see is that we will not be able to differentiate out of the 
box those fake containers and let the user be aware of them...
What do you think?

Regarding overcommitment, I also believe it is orthogonal, but can be nicely 
coupled with it.
The way I see the full picture is to use guaranteed containers for the fake 
containers, as well as for a few containers that we are sure are not going to 
be preempted. Then use NM-queuing (YARN-2883) and opportunistic containers to 
place more containers at the NMs (using YARN-5220). At the same time, we can 
enable overcommitment through YARN-1011 to start even more opportunistic 
containers, based on the actual node's utilization (especially if we know that 
the fake container usually does not use all the resources it has allocated).
Eventually we can also introduce additional container types, as [~curino] 
mentioned, to have even tighter control about what gets preempted.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-15 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332021#comment-15332021
 ] 

Inigo Goiri commented on YARN-5215:
---

[~ccurino], converting this to an umbrella makes sense to me. However, I'm not 
sure we reached to an agreement even with the simplest approach (patch v001). 
Once we agree on the general approach, I would open the following subtasks:
* Schedule containers considering external load in the RM
* Expose external load in the UI
* Use history to smooth external load
* Preempting containers from the RM based on external load
* Preempting containers from the NM based on external load

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331102#comment-15331102
 ] 

Carlo Curino commented on YARN-5215:


Regarding preemptable vs OPPORTUNISTIC, we had this conversation with 
[~kkaranasos] and [~asuresh], where containers types could be 
{{NON-PREEMPTABLE}}, {{PREEMPTABLE}}, {{OPPORTUNISTIC}}, describing basically 
increasing level of how likely is a task to be interrupted. 
For {{NON-PREEMPTABLE}} the system does everything it can not to interrupt (bar 
physical machine failures), {{PREEMPTABLE}} are containers with a low, but 
non-null chance of being interrupted by the scheduler (e.g., allocations above 
a queue capacity, but on dedicated resources), and {{OPPORTUNISTIC}} are tasks 
that have a high(er) chance of being interrupted as they run on the left-over 
capacity from other containers, or on overcommitted resources or, or other 
risky forms of resource harvesting (as in this JIRA).   

BTW [~elgoiri] I would suggest to turn this in an umbrella JIRA, and separate 
the sub-step you list [in the comment above | 
https://issues.apache.org/jira/browse/YARN-5215?focusedCommentId=15330649=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15330649].
 Some are likely less controversial and can be settled/committed earlier, while 
other parts are too "interesting" to go in easily :-)



> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-14 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330759#comment-15330759
 ] 

Inigo Goiri commented on YARN-5215:
---

My initial proposal was to add a generic support for external resources. 
However, we could also follow the node-level agent approach which could even 
show as a unmanaged fake container. That solution is also OK with me.

Going a little bit deeper into the example, in our most extreme scenario, we 
would set the guaranteed to 0GB and the opportuinistic to 16GB.
In any case, if we go into preemption, then we should leverage what we are 
doing in YARN-1011.

Regarding YARN-5202, they use the concept of preemptable which is pretty much 
the same as the OPPORTUNISTIC one. Actually, in our internal deployment right 
now, we just assume that everything running on YARN is preemptable and we 
preempt the youngest container.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330719#comment-15330719
 ] 

Karthik Kambatla commented on YARN-5215:


We happen to have a similar low-latency framework running alongside (and 
occasionally on) YARN. So, I am quite sympathetic to the problem. 

In the past, I have wondered if it makes sense to have a separate node-level 
agent that these other (white-listed) services could register with to get 
updates on each others' usage. That way, each framework is aware of others 
running on the cluster and the resources can be handed off more gracefully. 

If we are indeed looking to steal resources from these other services, I would 
think those resources should be allocated only to OPPORTUNISTIC containers and 
likely better handled through YARN-1011. For instance, in your earlier example, 
we would actually set yarn.nodemanager.resource.memory-mb to 14 GB which is 
allocated to GUARANTEED containers and YARN would also allocate OPPORTUNISTIC 
containers upto 2 GB based on how much of it is used by other frameworks. 

And, as Jason was mentioning earlier (IIUC), YARN-5202 provides this without 
the support for special OPPORTUNISTIC containers. Am I missing something? 

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-14 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330649#comment-15330649
 ] 

Inigo Goiri commented on YARN-5215:
---

[~kasha], in our use case we are targeting co-locating with latency sensitive 
workloads and they have diurnal patterns. For this type of workload, we need to 
be fairly reactive. Actually, preempting containers at the NM following the 
{{ContainersMonitor}} loop would be ideal.

The improvements in utilization are significant as right now we are just 
reserving for the peak of the latency sensitive workloads (around ~50%) of the 
machine. We tried at some point to have a separate service to periodically 
change the resources of the NMs but it's harder to operate.

In any case, in this first patch, we are just preventing scheduling containers 
and not adding preemption. I can add the following changes to the current patch:
# UI improvements
# History in the utilization to take decisions
# Preempting containers from the RM
# Preempting containers from the NM

The problem with preemption is that we would go into what to preempt and that 
might have some dependencies in the opportunistic stuff in YARN-1011.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330588#comment-15330588
 ] 

Karthik Kambatla commented on YARN-5215:


I am generally supportive of this.

Few questions to clarify the usecase and approach:
# How dynamic does this need to be?
# And, what range of utilization improvements are we targeting here? 60 - 80, 
75 - 80? 
# What are the characteristics of other workload running on these nodes? 

The reason I ask is to see if other approaches would suffice. For instance, 
would it be enough to gracefully increase/decrease the resources for Yarn on 
each node? i.e., {{yarn.nodemanager.resource.*}}. By graceful, I mean the 
decrease succeeds only after the tasks using those resources finish. 

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-13 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328631#comment-15328631
 ] 

Inigo Goiri commented on YARN-5215:
---

[~sunilg], I think your first two points are related. Let me try to give a full 
example of what it is now and what it would become with this. Right now, if we 
have a node with 16GB, we usually set the usable to the NM 
({{yarn.nodemanager.resource.memory-mb}}) to a smaller number like 14GB; the 
idea behind this is that the other services (NM itself and DN) can potentially 
use 2GB.

With this new approach, we can potentially set 
{{yarn.nodemanager.resource.memory-mb}} to 16GB and if the external processes 
consume 1GB, the NM can only allocate containers up to 15GB. Note that in our 
actual deployment, we set {{yarn.nodemanager.resource.memory-mb}} to something 
like 15GB to have some reserve to handle spikes.

Regarding your third point, we can add some kind of EWMA but I'm open to other 
proposals for smoothing spikes in the utilization numbers.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325713#comment-15325713
 ] 

Sunil G commented on YARN-5215:
---

Hi [~elgoiri]
Thanks for initiating this. This looks useful. I have some comments on this.

- Eventhough we have a minimumAllocation from Scheduler, its better we define a 
deadzone around the delta (nodeUtilization - containersUtilization) as it may 
help to avoid thrashing.
- externalUtilization is considered as follows
{code}
+  externalUtilization = ResourceUtilization.newInstance(nodeUtilization);
+  externalUtilization.subtractFrom(
+  containersUtilization.getPhysicalMemory(),
+  containersUtilization.getVirtualMemory(),
+  containersUtilization.getCPU());
{code}
Please correct me if I understood wrongly as I think there is a corner case.
Assume a node where 16GB memory is available and only 8Gb is assigned to 
NodeManager. And this node has some other process also running. So if 4GB is 
used by such external process, I think node's {{getUnallocatedResource}} will 
come as {{8GB(NM configured capacity) - 4GB (external process)}}. 
{code}
Resources.subtractFrom(unallocatedResource, externalResource);
{code}
I think NodeResourceMonitorImpl seems returning resourceUtilization of whole 
node. Its not capping with Node's configured capacity.

- This is a suggestion. As per current design, we are jumping to the possible 
unallocated resource in a node fast. Will it be better if we reach to this 
aggregated unallocated limit after checking few cycles of Node Utilization?


> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-09 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322829#comment-15322829
 ] 

Inigo Goiri commented on YARN-5215:
---

[~jlowe], my view on this feature is more like making the scheduler our of a 
"container" that is not managed by YARN. This is a dynamic size container. 
Before moving forward with this, we should agree that this is a proper 
abstraction and complementary to the overcommit effort.

Regarding the UI stuff, let's try to make a list of what to expose and I'll 
give it a try. My first idea is to expose the following per node/queue/cluster:
* Node utilization
* Container utilization
* External utilization
* Allocated/unallocated resources

Can you think on any other metric/aggregation we should be reporting?

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322685#comment-15322685
 ] 

Jason Lowe commented on YARN-5215:
--

bq. All in all, I see a strong connection with over-commit, but this should be 
represented not just as a heavily overcommitted cluster.

I guess I just see it differently.  As you mentioned, YARN is "scavanging" 
resources from a cluster that is doing something else.  In that sense YARN is 
stealing underutilized resources, and that's exactly what overcommit does.  A 
user that thinks they have full use of the node but it can be taken away 
arbitrarily by external load is just fooling themselves into a false sense of 
guaranteed capacity.  They really are using a heavily overcommitted cluster in 
practice, so why shouldn't YARN reflect that reality?

bq. I was thinking on exposing the getExternalUtilization() or the updated 
getUnallocated() through the Web UI, etc.

As I mentioned above, that alone is not going to update total cluster capacity 
nor the capacity available in various scheduler queues.  Users will have to do 
mental math with that metric and the reported cluster available capacity to 
understand why a scheduler queue showing free resources refuses to schedule 
more containers.  In addition the reduced capacity will not be properly 
accounted for among the scheduler queues, so the scheduler will end up 
scheduling differently than one that was aware of the true cluster capacity.


> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321847#comment-15321847
 ] 

Hadoop QA commented on YARN-5215:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 8 
new + 361 unchanged - 2 fixed = 369 total (was 363) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 1 new + 989 unchanged - 0 fixed = 990 total (was 989) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s {color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 15s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 37s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 49s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
 |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.RegisterNodeManagerRequestPBImpl.builder;
 locked 92% of time  Unsynchronized access at 

[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321809#comment-15321809
 ] 

Inigo Goiri commented on YARN-5215:
---

[~curino], in our cluster we actually surface to the Web UI the utilization. We 
also report negative values for the available resources. However, I think we 
should do a better job exposing this information similarly to what [~jlowe] has 
done in YARN-5202.

I'll start a thread in YARN-1011 about how to expose all this.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321799#comment-15321799
 ] 

Inigo Goiri commented on YARN-5215:
---

Still fixing the unit test.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321696#comment-15321696
 ] 

Carlo Curino commented on YARN-5215:


[~jlowe] thanks for the comment, very useful for context, and you bring up good 
points on how users "perceive" the cluster. 

[~elgoiri], correct me if I am wrong, but this feature seems ideal to 
"scavenge" a YARN cluster out of otherwise utilized machines. In these 
settings, users should be aware that the cluster is not constant, i.e., the 
effects of the fluctuations are non-trivial and expected. However, I agree with 
you that surfacing them in the UI somehow is important.

All in all, I see a strong connection with over-commit, but this should be 
represented not just as a heavily overcommitted cluster.  I agree with 
[~elgoiri] that it is useful to build this feature in a way that more 
explicitly acknowledges that YARN is not the only thing running on the cluster. 

At the same time, we should try to have a set of configurable that makes 
over/under-commit appear unified and coherent to the admins, and UIs that 
surface them properly to users. [~elgoiri] since you were involved in 
YARN-1011, can you propose a way to do that?
 

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321672#comment-15321672
 ] 

Inigo Goiri commented on YARN-5215:
---

Yes, I realized that the original title didn't mention external load. Fixed 
now, sorry about that; I think it's more clear. Feel free to tweak the 
description more.

As you mention, we could achieve this by tweaking the "guaranteed" size. 
However, I think that having the explicit concept regarding external 
utilization makes it simpler and it's compatible with the overcommit approach 
(both can be enabled/disabled independently). In addition, the concept of node 
utilization is not planned to be used in YARN-1011 for now.

I'm going to post during the next hour a patch with:
* Unit tests
* Conf switches
* Boundary checks

Then, I agree that we need to report this properly to the user. I was thinking 
on exposing the {{getExternalUtilization()}} or the updated 
{{getUnallocated()}} through the Web UI, etc. If we decide this feature should 
go ahead, I would add here or in a new JIRA.

To summarize the issues to discuss/finalize are:
* Decide if this should be a separate feature or within overcommit
* Add unit tests
* Add conf switches
* Add boundary checks
* Interface to expose this information

Regarding YARN-5202 vs YARN-1011, it looks to me like there's a lot of overlap 
between them. I think it'd be better to port most of YARN-5202 into YARN-1011. 
We probably should move this discussion into one of them.

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321648#comment-15321648
 ] 

Jason Lowe commented on YARN-5215:
--

Ah, so the headline was a bit misleading.  Most people saw that and thought 
this feature is going to schedule more containers on the server, but this is 
essentially the opposite of YARN-1011 and YARN-5202.  It's scaling down the 
nodes from the original size as the node utilization increases rather than 
scaling it up from the original size as the node utilization decreases.  
Instead of overcommit, this feature is "undercommit."  ;)

I'm OK with the idea of the feature in general, and I don't think it will be 
horribly incompatible.  In fact I think features like YARN-1011 / YARN-5202 
could emulate this behavior by tuning the "guaranteed" node size for YARN very 
low but allowing it to drastically overcommit up to the original node 
capability.  In other words rather than starting nodes big and scaling down, we 
start nodes small and scale up when we can.  The YARN-5202 patch is already 
dynamically scaling the node size based on the reported total node utilization, 
so it will respond to increasing external load similarly.  The only thing 
missing there is it won't go below the original node size no matter how bad the 
utilization gets, so either that would need to be changed or as I mentioned the 
users tune the feature differently to get this behavior.

Any thoughts on whether this is better implemented as an overcommit setup 
rather than an undercommit setup?  It may be confusing if YARN has two separate 
features doing essentially the same thing from opposite viewpoints.  Also the 
guaranteed containers from YARN-1011 are going to be difficult to guarantee if 
this feature can preempt them based on external node load.  Arguably the user 
should configure a guaranteed YARN capacity on these nodes and then YARN can 
opportunistically use the remaining node's capacity when it appears available.

If we do go with this approach, it seems like this patch is quite a ways off.  
Besides unit tests, conf switches, and boundary condition hardening, I think it 
will be confusing to users and admins to monitor it.  Simply adjusting the 
SchedulerNode will semantically accomplish the desired task as far as 
scheduling containers goes, but the UI, queue and cluster metrics will not 
reflect the reality of the scheduler.  For example if most of the nodes have 
significantly been scaled back due to external load, the scheduler UI will show 
a well-underutilized cluster when in reality it may be completely full and 
can't schedule another container.  That's going to be very confusing to users.  
And there are no metrics showing how much has been scaled back -- I think the 
user would have to go to the scheduler nodes page, sum the node capabilities 
themselves, notice it's significantly lower than the reported cluster total, 
and assume it must be this feature causing that anomaly.  I would think 
minimally the cluster size should be changing (along with the queue portions of 
that size) so the amount of utilization of the YARN cluster and scheduler UI is 
accurate.  That still leaves the user to divine why their cluster size is 
floating around over time when they aren't adding or removing nodes, which is 
why we may need another metric showing how much has been "stolen" by external 
node load outside of YARN.  Maybe we still have an overcommit metric but it 
goes negative when we've had original capacity removed by external factors?  
Not sure how best to represent it without over-cluttering the UI with a bunch 
of feature-specific fields.

This was addressed in the YARN-5202 patch by adjusting the queue and cluster 
metrics as we adjust the scheduler node, and there were also metrics and UI 
fields added to show the amount of overcommit.  Note that in the YARN-5202 
patch we added a fast-path to adjusting the node's size in the scheduler.  The 
typical remove-old-node-add-new-node form of updating is quite expensive since 
it computes unnecessary things like node label diffs, etc. and updates the 
metrics twice, once for the removal and once for the add.  Since this kind of 
feature is going to be adjusting node sizes all the time, a node adjustment 
needs to be as cheap as possible while still keeping the UI and metrics up to 
date.


> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers 

[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321562#comment-15321562
 ] 

Carlo Curino commented on YARN-5215:


Bar proper synching with YARN-1011 and YARN-5202 efforts and polishing the 
patch to include tests and conf switches, 
I am very supportive of this effort. It seems a rather simple change that can 
deal with a broad set of issues when YARN 
is not the only thing running on a set of machines.  The fact that you have 
been running this for a while is also reassuring. 
Were those prod clusters? Scale? 

Please provide a cleaned-up version of the patch, and comment on [~jlowe] 
comment.

[~jlowe] would you be ok with this going in? Can you build upon it for the work 
you are doing or is it horribly incompatible?  

> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org