[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263987#comment-16263987 ] Yufei Gu commented on YARN-7290: +1 for patch v5. Will commit if there is no objection. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL: https://issues.apache.org/jira/browse/YARN-7290 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand > Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, > YARN-7290.002.patch, YARN-7290.003.patch, YARN-7290.004.patch, > YARN-7290.005.patch > > > In FSAppAttempt#canContainerBePreempted, we make sure that preempting the > given container would not put the app below its fair share: > {code} > // Check if the app's allocation will be over its fairshare even > // after preempting this container > Resource usageAfterPreemption = Resources.clone(getResourceUsage()); > // Subtract resources of containers already queued for preemption > synchronized (preemptionVariablesLock) { > Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); > } > // Subtract this container's allocation to compute usage after preemption > Resources.subtractFrom( > usageAfterPreemption, container.getAllocatedResource()); > return !isUsageBelowShare(usageAfterPreemption, getFairShare()); > {code} > However, this only considers one container in isolation, and fails to > consider containers for the same app that we already added to > {{preemptableContainers}} in > FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a > case where we preempt multiple containers from the same app, none of which by > itself puts the app below fair share, but which cumulatively do so. > I've attached a patch with a test to show this behavior. The flow is: > 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated > all the resources (8g and 8vcores) > 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 > containers, each of which is 3g and 3vcores in size. At this point both > greedyApp and starvingApp have a fair share of 4g (with DRF not in use). > 3. For the first container requested by starvedApp, we (correctly) preempt 3 > containers from greedyApp, each of which is 1g and 1vcore. > 4. For the second container requested by starvedApp, we again (this time > incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below > its fair share, but happens anyway because all six times that we call > {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the > value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using > debugger). > So in addition to accounting for {{resourcesToBePreempted}}, we also need to > account for containers that we're already planning on preempting in > FSPreemptionThread#identifyContainersToPreemptOnNode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263800#comment-16263800 ] Hadoop QA commented on YARN-7290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 3 unchanged - 1 fixed = 4 total (was 4) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 35s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}108m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898982/YARN-7290.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fb121c6549ab 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / aab4395 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18641/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263655#comment-16263655 ] Yufei Gu commented on YARN-7290: Thanks for the new patch, [~Steven Rand]. I like the idea to put the map into class {{PreemptableContainers}}. Although, there is no need for having both {{List containers}} and {{MapresourcesToPreemptByApp}}. Just keep one should be enough, probably having something like {{Map }}. Also please check the style issues found by Hadoop QA. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL: https://issues.apache.org/jira/browse/YARN-7290 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand > Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, > YARN-7290.002.patch, YARN-7290.003.patch, YARN-7290.004.patch > > > In FSAppAttempt#canContainerBePreempted, we make sure that preempting the > given container would not put the app below its fair share: > {code} > // Check if the app's allocation will be over its fairshare even > // after preempting this container > Resource usageAfterPreemption = Resources.clone(getResourceUsage()); > // Subtract resources of containers already queued for preemption > synchronized (preemptionVariablesLock) { > Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); > } > // Subtract this container's allocation to compute usage after preemption > Resources.subtractFrom( > usageAfterPreemption, container.getAllocatedResource()); > return !isUsageBelowShare(usageAfterPreemption, getFairShare()); > {code} > However, this only considers one container in isolation, and fails to > consider containers for the same app that we already added to > {{preemptableContainers}} in > FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a > case where we preempt multiple containers from the same app, none of which by > itself puts the app below fair share, but which cumulatively do so. > I've attached a patch with a test to show this behavior. The flow is: > 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated > all the resources (8g and 8vcores) > 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 > containers, each of which is 3g and 3vcores in size. At this point both > greedyApp and starvingApp have a fair share of 4g (with DRF not in use). > 3. For the first container requested by starvedApp, we (correctly) preempt 3 > containers from greedyApp, each of which is 1g and 1vcore. > 4. For the second container requested by starvedApp, we again (this time > incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below > its fair share, but happens anyway because all six times that we call > {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the > value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using > debugger). > So in addition to accounting for {{resourcesToBePreempted}}, we also need to > account for containers that we're already planning on preempting in > FSPreemptionThread#identifyContainersToPreemptOnNode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261947#comment-16261947 ] Hadoop QA commented on YARN-7290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 4 unchanged - 0 fixed = 6 total (was 4) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 59s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898776/YARN-7290.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 882f8c5deae2 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 03c311e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18621/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260535#comment-16260535 ] Yufei Gu commented on YARN-7290: Thanks for filing this issue and working on it, [~Steven Rand]. The issue is valid and the solution looks good to me generally. Here are my comments: - It would be nice to put logic of computing app resource usage after preemption into a separate method, say {{getUsageAfterPreemption()}}, which would cover the following code. {code} // Check if the app's allocation will be over its fairshare even // after preempting this container Resource usageAfterPreemption = Resources.clone(getResourceUsage()); // Subtract resources of containers already queued for preemption synchronized (preemptionVariablesLock) { Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); } // Account for this container and other containers that the // FSPreemptionThread is already considering for preemption. Resource totalThatWouldBePreempted = Resources.add(alreadyConsideringForPreemption, container.getAllocatedResource()); Resources.subtractFrom( usageAfterPreemption, totalThatWouldBePreempted); {code} - Does your new unit test cover the second issue you try to solve? If not, could you please create one for that? - Method {{identifyContainersToPreemptOnNode}} gets very involved. We probably need refactor it a bit. Either to split the function, or to put {{consideringForPreemption}} into each {{FSAppAttemp}} would help. - Remove unused import in {{FSPreemptionThread}}. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL: https://issues.apache.org/jira/browse/YARN-7290 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand > Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, > YARN-7290.002.patch, YARN-7290.003.patch > > > In FSAppAttempt#canContainerBePreempted, we make sure that preempting the > given container would not put the app below its fair share: > {code} > // Check if the app's allocation will be over its fairshare even > // after preempting this container > Resource usageAfterPreemption = Resources.clone(getResourceUsage()); > // Subtract resources of containers already queued for preemption > synchronized (preemptionVariablesLock) { > Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); > } > // Subtract this container's allocation to compute usage after preemption > Resources.subtractFrom( > usageAfterPreemption, container.getAllocatedResource()); > return !isUsageBelowShare(usageAfterPreemption, getFairShare()); > {code} > However, this only considers one container in isolation, and fails to > consider containers for the same app that we already added to > {{preemptableContainers}} in > FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a > case where we preempt multiple containers from the same app, none of which by > itself puts the app below fair share, but which cumulatively do so. > I've attached a patch with a test to show this behavior. The flow is: > 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated > all the resources (8g and 8vcores) > 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 > containers, each of which is 3g and 3vcores in size. At this point both > greedyApp and starvingApp have a fair share of 4g (with DRF not in use). > 3. For the first container requested by starvedApp, we (correctly) preempt 3 > containers from greedyApp, each of which is 1g and 1vcore. > 4. For the second container requested by starvedApp, we again (this time > incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below > its fair share, but happens anyway because all six times that we call > {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the > value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using > debugger). > So in addition to accounting for {{resourcesToBePreempted}}, we also need to > account for containers that we're already planning on preempting in > FSPreemptionThread#identifyContainersToPreemptOnNode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210418#comment-16210418 ] Steven Rand commented on YARN-7290: --- Thanks [~templedf]. For what it's worth, I was able to repro this on a live cluster as well as in the test. I let one spark-shell use the entire cluster, and then started a second spark-shell. The second-spark shell was able to preempt all of the first one's containers, including the Application Master. After I applied the patch, the second spark-shell was only able to preempt half of the cluster's resources away from the first one. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL: https://issues.apache.org/jira/browse/YARN-7290 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand > Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, > YARN-7290.002.patch > > > In FSAppAttempt#canContainerBePreempted, we make sure that preempting the > given container would not put the app below its fair share: > {code} > // Check if the app's allocation will be over its fairshare even > // after preempting this container > Resource usageAfterPreemption = Resources.clone(getResourceUsage()); > // Subtract resources of containers already queued for preemption > synchronized (preemptionVariablesLock) { > Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); > } > // Subtract this container's allocation to compute usage after preemption > Resources.subtractFrom( > usageAfterPreemption, container.getAllocatedResource()); > return !isUsageBelowShare(usageAfterPreemption, getFairShare()); > {code} > However, this only considers one container in isolation, and fails to > consider containers for the same app that we already added to > {{preemptableContainers}} in > FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a > case where we preempt multiple containers from the same app, none of which by > itself puts the app below fair share, but which cumulatively do so. > I've attached a patch with a test to show this behavior. The flow is: > 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated > all the resources (8g and 8vcores) > 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 > containers, each of which is 3g and 3vcores in size. At this point both > greedyApp and starvingApp have a fair share of 4g (with DRF not in use). > 3. For the first container requested by starvedApp, we (correctly) preempt 3 > containers from greedyApp, each of which is 1g and 1vcore. > 4. For the second container requested by starvedApp, we again (this time > incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below > its fair share, but happens anyway because all six times that we call > {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the > value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using > debugger). > So in addition to accounting for {{resourcesToBePreempted}}, we also need to > account for containers that we're already planning on preempting in > FSPreemptionThread#identifyContainersToPreemptOnNode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193249#comment-16193249 ] Daniel Templeton commented on YARN-7290: Looks generally good. I'll need to take a more careful look, though. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL: https://issues.apache.org/jira/browse/YARN-7290 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand > Attachments: YARN-7290.001.patch, YARN-7290.002.patch, > YARN-7290-failing-test.patch > > > In FSAppAttempt#canContainerBePreempted, we make sure that preempting the > given container would not put the app below its fair share: > {code} > // Check if the app's allocation will be over its fairshare even > // after preempting this container > Resource usageAfterPreemption = Resources.clone(getResourceUsage()); > // Subtract resources of containers already queued for preemption > synchronized (preemptionVariablesLock) { > Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); > } > // Subtract this container's allocation to compute usage after preemption > Resources.subtractFrom( > usageAfterPreemption, container.getAllocatedResource()); > return !isUsageBelowShare(usageAfterPreemption, getFairShare()); > {code} > However, this only considers one container in isolation, and fails to > consider containers for the same app that we already added to > {{preemptableContainers}} in > FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a > case where we preempt multiple containers from the same app, none of which by > itself puts the app below fair share, but which cumulatively do so. > I've attached a patch with a test to show this behavior. The flow is: > 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated > all the resources (8g and 8vcores) > 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 > containers, each of which is 3g and 3vcores in size. At this point both > greedyApp and starvingApp have a fair share of 4g (with DRF not in use). > 3. For the first container requested by starvedApp, we (correctly) preempt 3 > containers from greedyApp, each of which is 1g and 1vcore. > 4. For the second container requested by starvedApp, we again (this time > incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below > its fair share, but happens anyway because all six times that we call > {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the > value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using > debugger). > So in addition to accounting for {{resourcesToBePreempted}}, we also need to > account for containers that we're already planning on preempting in > FSPreemptionThread#identifyContainersToPreemptOnNode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192455#comment-16192455 ] Hadoop QA commented on YARN-7290: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 25s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 86m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-7290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12890474/YARN-7290.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 960219485c4d 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e6e614e | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/17790/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/17790/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL:
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192403#comment-16192403 ] Hadoop QA commented on YARN-7290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 4 unchanged - 0 fixed = 10 total (was 4) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 47m 16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 90m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-7290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12890465/YARN-7290.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6146f6e5b4c9 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cae1c73 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/17787/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/17787/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | |
[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192324#comment-16192324 ] Steven Rand commented on YARN-7290: --- An additional problem is that we call {{app.trackContainerForPreemption}} in {{preemptContainers}}, so after {{identifyContainersToPreempt}} has returned. Therefore after we've finished iterating through one container in the value of {{rr.getNumContainers()}}, we will have added some containers to {{containersToPreempt}}, but {{resourcesToBePreempted}} will not have been updated for any app. This allows subsequent calls to {{canContainerBePreempted}} in the same for loop to return {{true}} incorrectly, since we've already decided to preempt some containers, but the apps aren't aware of it yet. > canContainerBePreempted can return true when it shouldn't > - > > Key: YARN-7290 > URL: https://issues.apache.org/jira/browse/YARN-7290 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand > Attachments: YARN-7290-failing-test.patch > > > In FSAppAttempt#canContainerBePreempted, we make sure that preempting the > given container would not put the app below its fair share: > {code} > // Check if the app's allocation will be over its fairshare even > // after preempting this container > Resource usageAfterPreemption = Resources.clone(getResourceUsage()); > // Subtract resources of containers already queued for preemption > synchronized (preemptionVariablesLock) { > Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted); > } > // Subtract this container's allocation to compute usage after preemption > Resources.subtractFrom( > usageAfterPreemption, container.getAllocatedResource()); > return !isUsageBelowShare(usageAfterPreemption, getFairShare()); > {code} > However, this only considers one container in isolation, and fails to > consider containers for the same app that we already added to > {{preemptableContainers}} in > FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a > case where we preempt multiple containers from the same app, none of which by > itself puts the app below fair share, but which cumulatively do so. > I've attached a patch with a test to show this behavior. The flow is: > 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated > all the resources (8g and 8vcores) > 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 > containers, each of which is 3g and 3vcores in size. At this point both > greedyApp and starvingApp have a fair share of 4g (with DRF not in use). > 3. For the first container requested by starvedApp, we (correctly) preempt 3 > containers from greedyApp, each of which is 1g and 1vcore. > 4. For the second container requested by starvedApp, we again (this time > incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below > its fair share, but happens anyway because all six times that we call > {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the > value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using > debugger). > So in addition to accounting for {{resourcesToBePreempted}}, we also need to > account for containers that we're already planning on preempting in > FSPreemptionThread#identifyContainersToPreemptOnNode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org