[
https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698830#comment-14698830
]
Dan Shechter commented on YARN-3997:
------------------------------------
Hi,
I was trying to find the existing unit-tests for the Fair-Scheduler
preeption... All I could find was this:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java
Are the more tests hiding somewhere else?
> An Application requesting multiple core containers can't preempt running
> application made of single core containers
> -------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-3997
> URL: https://issues.apache.org/jira/browse/YARN-3997
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.7.1
> Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
> Reporter: Dan Shechter
> Assignee: Arun Suresh
> Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an
> application consuming 1-core containers, it will not kill off these
> containers when a new application kicks in requesting containers with a size
> > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well,
> preemption proceeds as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to
> kill off some container to make room for the new application, fails to find a
> SINGLE container satisfying the request for a 4-core container (since all
> existing containers are 1-core containers), and isn't "smart" enough to
> realize it needs to kill off 4 single-core containers (in this case) on a
> single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and
> never gets the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors
> (containers) while trying to launch h20.ai framework which INSISTS on having
> at least 4 cores per container.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)