[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-11 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434543#comment-16434543
 ] 

Zian Chen commented on YARN-8138:
-

Fix failed cases.

> No containers pre-empted from another queue when using node labels
> --
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Blocker
> Attachments: YARN-8138.001.patch
>
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster={noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433362#comment-16433362
 ] 

genericqa commented on YARN-8138:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 9 new + 17 unchanged - 0 fixed = 26 total (was 17) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 47s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8138 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918490/YARN-8138.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fbddb0aa800e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d919eb6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20294/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433270#comment-16433270
 ] 

Zian Chen commented on YARN-8138:
-

Upload a patch to add new UT to TestCapacitySchedulerSurgicalPreemption

> No containers pre-empted from another queue when using node labels
> --
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Blocker
> Attachments: YARN-8138.001.patch
>
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster={noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433269#comment-16433269
 ] 

Zian Chen commented on YARN-8138:
-

Investigated this issue and wrote a UT to reproduce it. According to the UT. 
the conclusion is the preemption happened after application 3 got submitted.  
But not happening as expected as the test scenario presented. There are several 
issues we need to clarify here.
 # When we set memory size for containers, we need to set them as multiple of 
1024 MB, otherwise, the scheduler will convert them into the nearest size which 
is bigger than the requested size which is multiple of 1024 MB. For example 
app3 had am container request of 750MB, instead, it will get 1024 MB as the 
container size.
 # According to the log, preemption seems not happened. but actually, it 
happened with a long time delay(1 minute probably), the reason is when we set 
"yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.reserved-container-delay-ms"
 property, the reserved container will not be allocated before we hit this 
timeout, which leads preemption will delay more before we hit this timeout.
 # Although we got preemption happened, we will not expect A3 to be able to 
launch all its requested containers. Because the amount of resource A3 can get 
should limit by minimum guaranteed resource for the queue the application 
submitted to. In this case, we will only expect two containers to preempt since 
Queue B will reach its minimum guaranteed resource (50% of the cluster 
resource) after two containers preempt from Queue A.

So my suggestion is recheck the test scenario with those issues mentioned above 
and change settings properly, and the test should pass.

 

[~leftnoteasy] , could you share your opinions as well? Thanks

> No containers pre-empted from another queue when using node labels
> --
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Blocker
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO