[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621414#comment-16621414 ] Tao Yang commented on YARN-8771: Thanks [~cheersyang] and [~leftnoteasy] ! > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620498#comment-16620498 ] Hudson commented on YARN-8771: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15011 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15011/]) YARN-8771. CapacityScheduler fails to unreserve when cluster resource (wwei: rev 0712537e799bc03855d548d1f4bd690dd478b871) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620492#comment-16620492 ] Weiwei Yang commented on YARN-8771: --- Oops, it should not be cherry-picked to branch-3.0 as YARN-8292 was fixed in 3.1.1. Just reverted it from branch-3.0. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620474#comment-16620474 ] Weiwei Yang commented on YARN-8771: --- LGTM, +1. Committing soon. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620469#comment-16620469 ] Hadoop QA commented on YARN-8771: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 46s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 20s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940394/YARN-8771.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1ac8077cf8a5 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e435e12 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21880/testReport/ | | Max. process+thread count | 903 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21880/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > CapacityScheduler fails to unres
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620333#comment-16620333 ] Tao Yang commented on YARN-8771: Attached v4 patch to fix check-style error. UT failures seem unrelated to the patch. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620180#comment-16620180 ] Hadoop QA commented on YARN-8771: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 0s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 24 unchanged - 0 fixed = 27 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 7s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}184m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSchedulingRequestUpdate | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940341/YARN-8771.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs c
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620042#comment-16620042 ] Tao Yang commented on YARN-8771: Attached v3 patch to improve unit test without adding new "resource-types-1.xml" file. Found the {{yarn.test.reset-resource-types}} configuration item from TestCapacitySchedulerWithMultiResourceTypes, it can avoid reloading resource types in MockRM. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620001#comment-16620001 ] Tao Yang commented on YARN-8771: Thanks [~leftnoteasy] for your review and suggestion. I will update the patch later to improve unit test. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619835#comment-16619835 ] Wangda Tan commented on YARN-8771: -- Nice catch! Thanks [~Tao Yang]. Patch LGTM as well. For the test, you can check TestCapacitySchedulerWithMultiResourceTypes as examples about how to do unit tests for multiple resource types without adding resource-types.xml. And I think we should put this to branch-3.1 as well. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619208#comment-16619208 ] Weiwei Yang commented on YARN-8771: --- OK, if that's the case, I am fine with the patch. LGTM, +1, if we don't get any further comments, I will commit the patch by tomorrow. Thanks [~Tao Yang]. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619181#comment-16619181 ] Tao Yang commented on YARN-8771: Thanks [~cheersyang] for the review. {quote} Instead of adding a new "resource-types-1.xml", can we use TestResourceUtils#addNewTypesToResources for the tests? I think it doesn't matter to test with existing gpu or fpga resource correct? {quote} IIUC, MockRM will reset resource types and reload resource-types.xml internally, without resource-types.xml, MockRM will only have two resource types (memory and vcores), so that we can't simulate that cluster contains empty resource type. Thoughts? > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617162#comment-16617162 ] Weiwei Yang commented on YARN-8771: --- [~Tao Yang], the patch looks good to me. Using isAnyMajorResourceAboveZero check against the unreserve resource looks reasonable. Comments: * Instead of adding a new "resource-types-1.xml", can we use TestResourceUtils#addNewTypesToResources for the tests? I think it doesn't matter to test with existing gpu or fpga resource correct? Since isAnyMajorResourceAboveZero was added via YARN-8292, cc-ing [~jlowe], [~eepayne] and [~leftnoteasy] for cross-check. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614673#comment-16614673 ] Weiwei Yang commented on YARN-8771: --- [~Tao Yang], good catch and nice UT. I will help to review. +[~sunilg] too > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of > {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614391#comment-16614391 ] Hadoop QA commented on YARN-8771: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 13s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939643/YARN-8771.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux c61ad89506f8 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ef5c776 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21835/testReport/ | | Max. process+thread count | 861 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21835
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614300#comment-16614300 ] Tao Yang commented on YARN-8771: Attached v2 patch to fix checkstyle error. UT failure can't be reproduced in my local environment, it seems unrelated to this patch. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of > {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614021#comment-16614021 ] Hadoop QA commented on YARN-8771: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 24 unchanged - 0 fixed = 26 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 6s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939543/YARN-8771.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 34b8283948e4 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e1b242a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21832/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613299#comment-16613299 ] Tao Yang commented on YARN-8771: Attached v1 patch for review. [~cheersyang], can you help to review this patch in your free time? Thanks > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of > {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org