[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540540#comment-13540540
 ] 

Sandy Ryza commented on YARN-2:
-------------------------------

The idea of virtual cores seems unintuitive to me.  Choosing how much of each 
resource to request is a difficult, somewhat undefined, task already.  If I 
want to try to decide how much CPU to request for my job/task the thing I'd 
think to do would be to run it locally and see what percentage it takes up, 
which wouldn't be a perfect measure for a number of reasons, but would probably 
suffice most of the time.  To have to look up how many virtual cores the 
machines are assigned and then try to translate to an integer factor of that 
seems unnecessary and confusing.

This becomes even more difficult if I want to run a job against multiple 
clusters, each with different numbers of virtual cores per node.  While 
vmem-to-pmem can also vary across clusters, it is tied directly as a knob to 
oversubscription, likely does not vary by orders of magnitude, and has a clear 
meaning in terms of what is going on in an operating system.  On the other 
hand, virtual cores conflate oversubscription with request granularity - on one 
cluster my request for a virtual core might mean a quarter of the CPU that it 
does on another cluster, because the former wants to support finer granularity.

As Karthik says, we might be able to provide a different view for job 
submission that translates some more intuitive measure to virtual cores, but 
for consistency, we would also need to report this measure wherever resource 
requests and consumption are reported (web UI, metrics, command line).  Once we 
expect the user to think about it in a certain way, is there a strong reason 
for having a different model internally?
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: YARN-2
>                 URL: https://issues.apache.org/jira/browse/YARN-2
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: capacityscheduler, scheduler
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 2.0.3-alpha
>
>         Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
> MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
> MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, 
> YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, 
> YARN-2.patch, YARN-2.patch
>
>
> With YARN being a general purpose system, it would be useful for several 
> applications (MPI et al) to specify not just memory but also CPU (cores) for 
> their resource requirements. Thus, it would be useful to the 
> CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to