[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-23 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263987#comment-16263987
 ] 

Yufei Gu commented on YARN-7290:


+1 for patch v5. Will commit if there is no objection.

> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, 
> YARN-7290.002.patch, YARN-7290.003.patch, YARN-7290.004.patch, 
> YARN-7290.005.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263800#comment-16263800
 ] 

Hadoop QA commented on YARN-7290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 3 unchanged - 1 fixed = 4 total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}108m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898982/YARN-7290.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fb121c6549ab 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / aab4395 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/18641/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-22 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263655#comment-16263655
 ] 

Yufei Gu commented on YARN-7290:


Thanks for the new patch, [~Steven Rand]. I like the idea to put the map into 
class {{PreemptableContainers}}. Although, there is no need for having both 
{{List containers}} and {{Map 
resourcesToPreemptByApp}}. Just keep one should be enough, probably having 
something like {{Map}}. Also please check the style 
issues found by Hadoop QA.

> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, 
> YARN-7290.002.patch, YARN-7290.003.patch, YARN-7290.004.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261947#comment-16261947
 ] 

Hadoop QA commented on YARN-7290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 4 unchanged - 0 fixed = 6 total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 31s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898776/YARN-7290.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 882f8c5deae2 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 03c311e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/18621/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-21 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260535#comment-16260535
 ] 

Yufei Gu commented on YARN-7290:


Thanks for filing this issue and working on it, [~Steven Rand]. The issue is 
valid and the solution looks good to me generally. Here are my comments:
- It would be nice to put logic of computing app resource usage after preemption
  into a separate method, say {{getUsageAfterPreemption()}}, which would cover
  the following code.
{code}
// Check if the app's allocation will be over its fairshare even
// after preempting this container
Resource usageAfterPreemption = Resources.clone(getResourceUsage());

// Subtract resources of containers already queued for preemption
synchronized (preemptionVariablesLock) {
  Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
}

// Account for this container and other containers that the
// FSPreemptionThread is already considering for preemption.
Resource totalThatWouldBePreempted =
Resources.add(alreadyConsideringForPreemption,
container.getAllocatedResource());

Resources.subtractFrom(
usageAfterPreemption, totalThatWouldBePreempted);
{code}
- Does your new unit test cover the second issue you try to solve? If not, could
  you please create one for that? 
- Method {{identifyContainersToPreemptOnNode}} gets very involved. We probably 
need refactor it a bit. Either to split the function, or to put
  {{consideringForPreemption}} into each {{FSAppAttemp}} would help.
- Remove unused import in {{FSPreemptionThread}}.


> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, 
> YARN-7290.002.patch, YARN-7290.003.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-18 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210418#comment-16210418
 ] 

Steven Rand commented on YARN-7290:
---

Thanks [~templedf]. For what it's worth, I was able to repro this on a live 
cluster as well as in the test. I let one spark-shell use the entire cluster, 
and then started a second spark-shell. The second-spark shell was able to 
preempt all of the first one's containers, including the Application Master. 
After I applied the patch, the second spark-shell was only able to preempt half 
of the cluster's resources away from the first one.

> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, 
> YARN-7290.002.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-05 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193249#comment-16193249
 ] 

Daniel Templeton commented on YARN-7290:


Looks generally good.  I'll need to take a more careful look, though.

> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290.001.patch, YARN-7290.002.patch, 
> YARN-7290-failing-test.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192455#comment-16192455
 ] 

Hadoop QA commented on YARN-7290:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 
25s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | YARN-7290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12890474/YARN-7290.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 960219485c4d 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e6e614e |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/17790/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/17790/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: 

[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192403#comment-16192403
 ] 

Hadoop QA commented on YARN-7290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 4 unchanged - 0 fixed = 10 total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 47m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | YARN-7290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12890465/YARN-7290.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6146f6e5b4c9 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cae1c73 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/17787/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/17787/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| 

[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192324#comment-16192324
 ] 

Steven Rand commented on YARN-7290:
---

An additional problem is that we call {{app.trackContainerForPreemption}} in 
{{preemptContainers}}, so after {{identifyContainersToPreempt}} has returned. 
Therefore after we've finished iterating through one container in the value of 
{{rr.getNumContainers()}}, we will have added some containers to 
{{containersToPreempt}}, but {{resourcesToBePreempted}} will not have been 
updated for any app. This allows subsequent calls to 
{{canContainerBePreempted}} in the same for loop to return {{true}} 
incorrectly, since we've already decided to preempt some containers, but the 
apps aren't aware of it yet.

> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org