[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477965#comment-13477965
 ] 

Robert Joseph Evans commented on YARN-2:
----------------------------------------

Arun, I still disagree with the #cores being an int.  

What does requesting 1 CPU really mean and how is it different from requesting 
1.8?  To me 1 CPU means that for this particular container I want to be 
guaranteed that it gets at least 1 full CPU core to itself for computation at 
any point in time it needs it, very similar to what requesting 3000MB of memory 
does.  It is a bit more ambiguous because 1 CPU on box A is not necessarily 
equivalent to 1 CPU on box B. But this JIRA already makes the assumption that 
they are close enough to being equivalent.  It gives me as a user of the 
container a chance to set a lower bound on the amount of resources that I am 
guaranteed to be able to use.  In practice this probably means that the kernel 
will give at least X% of the available CPU time to the processes running in 
that container, if those processes are runnable, where X = CPU requested/Total 
CPU cores on the box.

1.8 CPUs to me means a few things.  First the person requesting this was either 
a machine or was overly ambitious in trying to get an exact value.  Second the 
container will probably get 2 CPU cores, because just like with memory I would 
expect the scheduler to round it up to the nearest multiple of a scheduling 
unit.  I proposed initially that quarter or even half CPU marks are probably 
sufficient.  We can always round up and remove precision with a float.  It is 
very hard to go back the other way though and add precision to an int.  I am 
fine with the first go around the CPU number is in float and the scheduling 
unit is 1 CPU. I just want the door left open so we can easily adjust things if 
we find a need to.

Over-subscribing makes since but it also has a lot of pitfalls.  You have to 
take into account that resource utilization is not constant.  A process can use 
very little of a resource and then all of a sudden it starts to use lots of 
that resource.  Is the Resource request a guarantee of those resources, or is 
it just a good effort to provide those resources?  I see situations where users 
would what both, and perhaps if we do support over-subscribing we need to 
support something like nice on POSIX.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: YARN-2
>                 URL: https://issues.apache.org/jira/browse/YARN-2
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: capacityscheduler, scheduler
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 2.0.3-alpha
>
>         Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
> MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
> MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, 
> YARN-2.patch, YARN-2.patch, YARN-2.patch
>
>
> With YARN being a general purpose system, it would be useful for several 
> applications (MPI et al) to specify not just memory but also CPU (cores) for 
> their resource requirements. Thus, it would be useful to the 
> CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to