[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8138:
----------------------------
    Comment: was deleted

(was: Investigated this issue and wrote a UT to reproduce it. According to the 
UT. the conclusion is the preemption happened after application 3 got 
submitted.  But not happening as expected as the test scenario presented. There 
are several issues we need to clarify here.
 # When we set memory size for containers, we need to set them as multiple of 
1024 MB, otherwise, the scheduler will convert them into the nearest size which 
is bigger than the requested size which is multiple of 1024 MB. For example 
app3 had am container request of 750MB, instead, it will get 1024 MB as the 
container size.
 # According to the log, preemption seems not happened. but actually, it 
happened with a long time delay(1 minute probably), the reason is when we set 
"yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.reserved-container-delay-ms"
 property, the reserved container will not be allocated before we hit this 
timeout, which leads preemption will delay more before we hit this timeout.
 # Although we got preemption happened, we will not expect A3 to be able to 
launch all its requested containers. Because the amount of resource A3 can get 
should limit by minimum guaranteed resource for the queue the application 
submitted to. In this case, we will only expect two containers to preempt since 
Queue B will reach its minimum guaranteed resource (50% of the cluster 
resource) after two containers preempt from Queue A.

So my suggestion is recheck the test scenario with those issues mentioned above 
and change settings properly, and the test should pass.

 

[~leftnoteasy] , could you share your opinions as well? Thanks)

> Add unit test to validate queue priority preemption works under node 
> partition.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-8138
>                 URL: https://issues.apache.org/jira/browse/YARN-8138
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Charan Hebri
>            Assignee: Zian Chen
>            Priority: Minor
>         Attachments: YARN-8138.001.patch, YARN-8138.002.patch
>
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XXXXXXXXXX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource=<memory:3072, vCores:1> 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=<memory:18000, vCores:3>
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XXXXXXXXXX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource=<memory:3072, vCores:1> 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=<memory:18000, vCores:3>
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XXXXXXXXXX:25454
> 2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource=<memory:3072, vCores:1> 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=<memory:18000, vCores:3>{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to