[jira] [Comment Edited] (YARN-4931) Preempted resources go back to the same application

Miles Crawford (JIRA) Thu, 01 Dec 2016 09:25:28 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712538#comment-15712538
 ]


Miles Crawford edited comment on YARN-4931 at 12/1/16 5:24 PM:
---------------------------------------------------------------

{code}
<?xml version="1.0"?>
<allocations>
    <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
    <defaultMinSharePreemptionTimeout>5</defaultMinSharePreemptionTimeout>
    <defaultFairSharePreemptionTimeout>20</defaultFairSharePreemptionTimeout>
    
<defaultFairSharePreemptionThreshold>0.8</defaultFairSharePreemptionThreshold>
    <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
</allocations>
{code}
We use Amazon's EMR, so the yarn-site and mapred-site.xml are a part of their 
managed setup.


was (Author: milesc):
```<?xml version="1.0"?>
<allocations>
    <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
    <defaultMinSharePreemptionTimeout>5</defaultMinSharePreemptionTimeout>
    <defaultFairSharePreemptionTimeout>20</defaultFairSharePreemptionTimeout>
    
<defaultFairSharePreemptionThreshold>0.8</defaultFairSharePreemptionThreshold>
    <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
</allocations>
```

We use Amazon's EMR, so the yarn-site and mapred-site.xml are a part of their 
managed setup.

> Preempted resources go back to the same application
> ---------------------------------------------------
>
>                 Key: YARN-4931
>                 URL: https://issues.apache.org/jira/browse/YARN-4931
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.2
>            Reporter: Miles Crawford
>         Attachments: resourcemanager.log
>
>
> Sometimes a queue that needs resources causes preemption - but the preempted 
> containers are just allocated right back to the application that just 
> released them!
> Here is a tiny application (0007) that wants resources, and a container is 
> preempted from application 0002 to satisfy it:
> {code}
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> (FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for 
> queue root.default: resDueToMinShare = <memory:0, vCores:0>, 
> resDueToFairShare = <memory:448, vCores:0>
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> (FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264, 
> vCores:1>) from queue root.milesc
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics
>  (FairSchedulerUpdateThread): Non-AM container preempted, current 
> appAttemptId=appattempt_1460047303577_0002_000001, 
> containerId=container_1460047303577_0002_01_001038, resource=<memory:15264, 
> vCores:1>
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container 
> Transitioned from RUNNING to KILLED
> {code}
> But then a moment later, application 00002 gets the container right back:
> {code}
> 2016-04-07 21:08:13,844 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode 
> (ResourceManager Event Processor): Assigned container 
> container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> 
> on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 
> containers, <memory:241248, vCores:18> used and <memory:416, vCores:46> 
> available after allocation
> 2016-04-07 21:08:14,555 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 
> Container Transitioned from ALLOCATED to ACQUIRED
> 2016-04-07 21:08:14,845 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (ResourceManager Event Processor): container_1460047303577_0002_01_001039 
> Container Transitioned from ACQUIRED to RUNNING
> {code}
> This results in new applications being unable to even get an AM, and never 
> starting at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (YARN-4931) Preempted resources go back to the same application

Reply via email to