[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-26 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629699#comment-16629699
 ] 

Tao Yang commented on YARN-8804:


Thanks [~jlowe] for the review and commit!

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 2.8.6
>
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch, 
> YARN-8804.003.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629535#comment-16629535
 ] 

Hudson commented on YARN-8804:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15065 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15065/])
YARN-8804. resourceLimits may be wrongly calculated when leaf-queue is (jlowe: 
rev 6b988d821e62d29c118e10a7213583b92c302baf)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java


> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch, 
> YARN-8804.003.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-24 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626564#comment-16626564
 ] 

Jason Lowe commented on YARN-8804:
--

Thanks for updating the patch!  +1 lgtm.  I'll commit this by Wednesday if 
there are no objections.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch, 
> YARN-8804.003.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-24 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625441#comment-16625441
 ] 

Sunil Govindan commented on YARN-8804:
--

As this is one of the last Critical one for 3.2, [~jlowe] and [~leftnoteasy] 
[~Tao Yang] could u pls help to review the latest patch.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch, 
> YARN-8804.003.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624398#comment-16624398
 ] 

Hadoop QA commented on YARN-8804:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 43 unchanged - 0 fixed = 44 total (was 43) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8804 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940860/YARN-8804.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux df910130241f 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4758b4b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21938/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21938/testReport/ |
| Max. process+thread count | 890 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624395#comment-16624395
 ] 

Hadoop QA commented on YARN-8804:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 42 unchanged - 0 fixed = 43 total (was 42) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 
45s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8804 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940860/YARN-8804.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e1783729632e 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4758b4b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21937/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21937/testReport/ |
| Max. process+thread count | 846 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-21 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624281#comment-16624281
 ] 

Tao Yang commented on YARN-8804:


Thanks [~jlowe] for the review.
Attached v3 patch to rebase it on latest trunk and submitted this patch.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch, 
> YARN-8804.003.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623037#comment-16623037
 ] 

Tao Yang commented on YARN-8804:


Attached v2 patch to remove violate keyword and improve the calculation.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623035#comment-16623035
 ] 

Tao Yang commented on YARN-8804:


Thanks [~jlowe],[~leftnoteasy] for your review and reply. 
For the volatile keyword, it is a mistake when I copied from headroom field in 
ResourceLimits. I should removed it after did that. The resourceLimits in 
scheduling process is thread-safe because it isn't shared by multiple 
scheduling threads. Every scheduling thread will create a ResourceLimits 
instance at the beginning of scheduling process in 
CapacityScheduler#allocateOrReserveNewContainers or 
CapacityScheduler#allocateContainerOnSingleNode then pass it on. 
{quote}
I think it would be cleaner if a queue could return an assignment result that 
not only indicated the allocation was skipped due to queue limits but also how 
much needs to be reserved as a result of that skipped assignment.
{quote}
Now we can get reserved resource from \{{childLimits.getHeadroom()}} for leaf 
queue, then add it into the blockedHeadroom of leaf/parent queue, so that later 
queues can get correct net limits through {{limit - blockedHeadroom}}.
{quote}
The result would be less overhead for the normal scheduler loop, as we would 
only be adjusting when necessary rather than every time.
{quote}
Thanks for this mention. I will improve the calculation to avoid doing it every 
time through adding ResourceLimits#getNetLimit, this method will do calculation 
when necessary rather than every time.
{quote}
>From my analysis of YARN-8513, scheduler tries to allocate containers to queue 
>when it will go beyond max capacity (used + allocating > max). But resource 
>committer will reject such proposal.
{quote}
YARN-8513 is not the same problem with this issue as the former comments from 
[~jlowe], it seems similar to YARN-8771 which may be caused by wrong 
calculation when needUnreservedResource with empty resource type in 
RegularContainerAllocator#assignContainer. But I am not sure they are the same 
problem.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622771#comment-16622771
 ] 

Jason Lowe commented on YARN-8804:
--

>From the looks of YARN-8513 this appears to be a separate issue.  Looks like 
>YARN-8513 describes an infinite loop where an allocation keeps getting 
>rejected, whereas this describes a situation where an allocation never occurs 
>so therefore cannot be rejected by the committer.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622460#comment-16622460
 ] 

Wangda Tan commented on YARN-8804:
--

Thanks [~Tao Yang] for the nice analysis, I'm not sure if YARN-8513 caused by 
this or similar issue. From my analysis of YARN-8513, scheduler tries to 
allocate containers to queue when it will go beyond max capacity (used + 
allocating > max). But resource committer will reject such proposal.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622281#comment-16622281
 ] 

Jason Lowe commented on YARN-8804:
--

Thanks for the report and patch!  Nice analysis.

A naked volatile here is not going to work when there are multiple threads.  It 
only works if we are swapping in precomputed objects, as is done with the 
headroom, and not when we're trying to modify them in-place as is done in this 
patch.  Otherwise we can have this classic multithreaded race where we lose 
information:
# blockedHeadroom is null
# Thread 1 and Thread 2 both call addBlockedHeadroom at the same time
# Thread 1 and Thread 2 both see that blockedHeadroom is null and both decide 
to create a new, empty instance
# Thread 2 races ahead of Thread 1 at this point, storing the new, updated 
object with its resource information
# Thread 1 finally gets around to storing the zero instance to blockedHeadroom, 
and now we just lost the computation from Thread 2

Since this is trying to compute on an existing object, it should use an 
AtomicReference where a new Resource object is stored with compareAndSet each 
time it is updated.  This allows threads to detect when they collide with 
another thread, looping around and trying again when a collision is detected.

We could avoid all of the blocked tracking business if the allocation result 
included the amount that needs to be blocked.  The crux of the problem lies in 
the computation of that amount when the queue returning the queue skipped 
assignment result wasn't a leaf queue.  I think it would be cleaner if a queue 
could return an assignment result that not only indicated the allocation was 
skipped due to queue limits but also how much needs to be reserved as a result 
of that skipped assignment.  Then that amount can be referenced directly as the 
limit is adjusted up the queue hierarchy rather than constantly subtracting the 
blocked limit from the parent limit.  The result would be less overhead for the 
normal scheduler loop, as we would only be adjusting when necessary rather than 
every time.


> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-20 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621631#comment-16621631
 ] 

Tao Yang commented on YARN-8804:


Attached v1 patch which will add a blockedHeadroom field in ResourceLimits to 
make correct calculations for every-level parent queues. [~kshukla], 
[~leftnoteasy], [~jlowe], please help to review in your free time, thanks!

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org