[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers

Karthik Kambatla (JIRA) Fri, 04 Mar 2016 10:34:47 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180328#comment-15180328
 ]


Karthik Kambatla commented on YARN-3997:
----------------------------------------

Sorry for the delay in following up on this. I am looking to take a 
comprehensive look at preemption in FairScheduler. 

[~damagebo], [~MindTheGap], [~ilanas], [~umiron] - can any of you comment on if 
FairScheduler is preempting any containers at all? Is it possible that 
containers are being preempted, but not all on the same node? I wonder if this 
is just another manifestation of YARN-2154? 

> An Application requesting multiple core containers can't preempt running 
> application made of single core containers
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3997
>                 URL: https://issues.apache.org/jira/browse/YARN-3997
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
>            Reporter: Dan Shechter
>            Assignee: Arun Suresh
>            Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an 
> application consuming 1-core containers, it will not kill off these 
> containers when a new application kicks in requesting containers with a size 
> > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well, 
> preemption proceeds as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to 
> kill off some container to make room for the new application, fails to find a 
> SINGLE container satisfying the request for a 4-core container (since all 
> existing containers are 1-core containers), and isn't "smart" enough to 
> realize it needs to kill off 4 single-core containers (in this case) on a 
> single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and 
> never gets the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors 
> (containers) while trying to launch h20.ai framework which INSISTS on having 
> at least 4 cores per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers

Reply via email to