[
https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miles Crawford updated YARN-4931:
---------------------------------
Description:
Sometimes a queue that needs resources causes preemption - but the preempted
containers are just allocated right back to the application that just released
them!
Here is a tiny application (0007) that wants resources, and a container is
preempted from application 0002 to satisfy it:
{code}
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
(FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for
queue root.default: resDueToMinShare = <memory:0, vCores:0>, resDueToFairShare
= <memory:448, vCores:0>
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
(FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264,
vCores:1>) from queue root.milesc
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics
(FairSchedulerUpdateThread): Non-AM container preempted, current
appAttemptId=appattempt_1460047303577_0002_000001,
containerId=container_1460047303577_0002_01_001038, resource=<memory:15264,
vCores:1>
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container
Transitioned from RUNNING to KILLED
{code}
But then a moment later, application 00002 gets the container right back:
{code}
2016-04-07 21:08:13,844 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode
(ResourceManager Event Processor): Assigned container
container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> on
host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers,
<memory:241248, vCores:18> used and <memory:416, vCores:46> available after
allocation
2016-04-07 21:08:14,555 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC
Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container
Transitioned from ALLOCATED to ACQUIRED
2016-04-07 21:08:14,845 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(ResourceManager Event Processor): container_1460047303577_0002_01_001039
Container Transitioned from ACQUIRED to RUNNING
{code}
This results in new applications being unable to even get an AM, and never
starting at all.
was:
Sometimes a queue that needs resources causes preemption - but the preempted
containers are just allocated right back to the application that just released
them!
Here is a tiny application (0007) that wants resources, and a container is
preempted from application 0002 to satisfy it:
{code}
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
(FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for
queue root.default: resDueToMinShare = <memory:0, vCores:0>, resDueToFairShare
= <memory:448, vCores:0>
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
(FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264,
vCores:1>) from queue root.milesc
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics
(FairSchedulerUpdateThread): Non-AM container preempted, current
appAttemptId=appattempt_1460047303577_0002_000001,
containerId=container_1460047303577_0002_01_001038, resource=<memory:15264,
vCores:1>
2016-04-07 21:08:13,463 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container
Transitioned from RUNNING to KILLED
{code}
But then a moment later, application 00002 gets the container right back:
{code}
2016-04-07 21:08:13,844 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode
(ResourceManager Event Processor): Assigned container
container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> on
host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers,
<memory:241248, vCores:18> used and <memory:416, vCores:46> available after
allocation
2016-04-07 21:08:14,555 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC
Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container
Transitioned from ALLOCATED to ACQUIRED
2016-04-07 21:08:14,845 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(ResourceManager Event Processor): container_1460047303577_0002_01_001039
Container Transitioned from ACQUIRED to RUNNING
{code}
This results in new applications being unable to even get an AM, and never
starting
> Preempted resources go back to the same application
> ---------------------------------------------------
>
> Key: YARN-4931
> URL: https://issues.apache.org/jira/browse/YARN-4931
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.7.2
> Reporter: Miles Crawford
> Attachments: resourcemanager.log
>
>
> Sometimes a queue that needs resources causes preemption - but the preempted
> containers are just allocated right back to the application that just
> released them!
> Here is a tiny application (0007) that wants resources, and a container is
> preempted from application 0002 to satisfy it:
> {code}
> 2016-04-07 21:08:13,463 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
> (FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for
> queue root.default: resDueToMinShare = <memory:0, vCores:0>,
> resDueToFairShare = <memory:448, vCores:0>
> 2016-04-07 21:08:13,463 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
> (FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264,
> vCores:1>) from queue root.milesc
> 2016-04-07 21:08:13,463 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics
> (FairSchedulerUpdateThread): Non-AM container preempted, current
> appAttemptId=appattempt_1460047303577_0002_000001,
> containerId=container_1460047303577_0002_01_001038, resource=<memory:15264,
> vCores:1>
> 2016-04-07 21:08:13,463 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
> (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container
> Transitioned from RUNNING to KILLED
> {code}
> But then a moment later, application 00002 gets the container right back:
> {code}
> 2016-04-07 21:08:13,844 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode
> (ResourceManager Event Processor): Assigned container
> container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1>
> on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13
> containers, <memory:241248, vCores:18> used and <memory:416, vCores:46>
> available after allocation
> 2016-04-07 21:08:14,555 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
> (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039
> Container Transitioned from ALLOCATED to ACQUIRED
> 2016-04-07 21:08:14,845 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
> (ResourceManager Event Processor): container_1460047303577_0002_01_001039
> Container Transitioned from ACQUIRED to RUNNING
> {code}
> This results in new applications being unable to even get an AM, and never
> starting at all.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)