[
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540540#comment-13540540
]
Sandy Ryza commented on YARN-2:
-------------------------------
The idea of virtual cores seems unintuitive to me. Choosing how much of each
resource to request is a difficult, somewhat undefined, task already. If I
want to try to decide how much CPU to request for my job/task the thing I'd
think to do would be to run it locally and see what percentage it takes up,
which wouldn't be a perfect measure for a number of reasons, but would probably
suffice most of the time. To have to look up how many virtual cores the
machines are assigned and then try to translate to an integer factor of that
seems unnecessary and confusing.
This becomes even more difficult if I want to run a job against multiple
clusters, each with different numbers of virtual cores per node. While
vmem-to-pmem can also vary across clusters, it is tied directly as a knob to
oversubscription, likely does not vary by orders of magnitude, and has a clear
meaning in terms of what is going on in an operating system. On the other
hand, virtual cores conflate oversubscription with request granularity - on one
cluster my request for a virtual core might mean a quarter of the CPU that it
does on another cluster, because the former wants to support finer granularity.
As Karthik says, we might be able to provide a different view for job
submission that translates some more intuitive measure to virtual cores, but
for consistency, we would also need to report this measure wherever resource
requests and consumption are reported (web UI, metrics, command line). Once we
expect the user to think about it in a certain way, is there a strong reason
for having a different model internally?
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
> Key: YARN-2
> URL: https://issues.apache.org/jira/browse/YARN-2
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: capacityscheduler, scheduler
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch,
> MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch,
> MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch,
> YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch,
> YARN-2.patch, YARN-2.patch
>
>
> With YARN being a general purpose system, it would be useful for several
> applications (MPI et al) to specify not just memory but also CPU (cores) for
> their resource requirements. Thus, it would be useful to the
> CapacityScheduler to account for both.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira