[
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268908#comment-15268908
]
Jason Lowe commented on YARN-4280:
----------------------------------
The proposed algorithm does not change how reserved containers work -- I'm not
proposing to add a reserved container where one does not exist today. The
algorithm also does not allocate a reserved container beyond the queue max cap,
so there should be no extra issues there.
There will be cases where resources can appear to be free but won't be
allocated due to the "lockdown" of one ore more queues. We could do something
like the proposed increasing reservation container to take up the extra
resources so they don't appear to be free, but I think that will significantly
complicate things. We already have the free-but-unused problem today with
existing reservations that cannot be applied beyond the queue max cap if no
other app comes along to use the small leftovers. I think we should get the
proposed algorithm prototyped and focus on getting that in. That would be a
significant, incremental improvement over what we have today when a queue gets
close to full. We can then focus on how to better represent the behavior in
the UI in a followup JIRA.
> CapacityScheduler reservations may not prevent indefinite postponement on a
> busy cluster
> ----------------------------------------------------------------------------------------
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.6.1, 2.8.0, 2.7.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at
> total cluster capacity. There are 2 applications, appX that runs on Queue A,
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2
> GB containers.
> The user limit is high enough for the application to reach 100% of the
> cluster resource.
> appX is running at total cluster capacity, full with 1G containers releasing
> only one container at a time. appY comes in with a request of 2GB container
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it
> has higher priority and should reserve for its 2 GB request. Since this
> request puts the alloc+reserve above total capacity of the cluster,
> reservation is not made. appX comes in with a 1GB request and since 1GB is
> still available, the request is allocated.
> This can continue indefinitely causing priority inversion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]