[
https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandy Ryza resolved YARN-972.
-----------------------------
Resolution: Won't Fix
> Allow requests and scheduling for fractional virtual cores
> ----------------------------------------------------------
>
> Key: YARN-972
> URL: https://issues.apache.org/jira/browse/YARN-972
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: api, scheduler
> Affects Versions: 2.0.5-alpha
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
>
> As this idea sparked a fair amount of discussion on YARN-2, I'd like to go
> deeper into the reasoning.
> Currently the virtual core abstraction hides two orthogonal goals. The first
> is that a cluster might have heterogeneous hardware and that the processing
> power of different makes of cores can vary wildly. The second is that a
> different (combinations of) workloads can require different levels of
> granularity. E.g. one admin might want every task on their cluster to use at
> least a core, while another might want applications to be able to request
> quarters of cores. The former would configure a single vcore per core. The
> latter would configure four vcores per core.
> I don't think that the abstraction is a good way of handling the second goal.
> Having a virtual cores refer to different magnitudes of processing power on
> different clusters will make the difficult problem of deciding how many cores
> to request for a job even more confusing.
> Can we not handle this with dynamic oversubscription?
> Dynamic oversubscription, i.e. adjusting the number of cores offered by a
> machine based on measured CPU-consumption, should work as a complement to
> fine-granularity scheduling. Dynamic oversubscription is never going to be
> perfect, as the amount of CPU a process consumes can vary widely over its
> lifetime. A task that first loads a bunch of data over the network and then
> performs complex computations on it will suffer if additional CPU-heavy tasks
> are scheduled on the same node because its initial CPU-utilization was low.
> To guard against this, we will need to be conservative with how we
> dynamically oversubscribe. If a user wants to explicitly hint to the
> scheduler that their task will not use much CPU, the scheduler should be able
> to take this into account.
> On YARN-2, there are concerns that including floating point arithmetic in the
> scheduler will slow it down. I question this assumption, and it is perhaps
> worth debating, but I think we can sidestep the issue by multiplying
> CPU-quantities inside the scheduler by a decently sized number like 1000 and
> keep doing the computations on integers.
> The relevant APIs are marked as evolving, so there's no need for the change
> to delay 2.1.0-beta.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira