[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729895#comment-13729895
 ] 

Jason Lowe commented on YARN-1024:
----------------------------------

Agree that the example posed by [~sandyr] shows that a single unit in the 
request cannot properly convey the ask.  Chatted briefly about this offline 
with [~revans2] and [~nroberts] and we think in general there needs to be a way 
to show the parallelism needed along with some performance guarantee from those 
threads.  That basically leads us to a path where in the generalized case we're 
asking for a list of vcore units, where the number of entries in the list 
represents the desired hardware parallelism and the value of each entry 
represents the performance needed for that execution thread.

Using this with Sandy's example, asking for a single unit of 250 YVCs means it 
would not be allocated on the node with three cores each rated at 150 YVCs 
because none of the cores meets the single-threaded performance needed by the 
container.  If another job came along and asked for three cores each at 100 
YVCs, that could still run on a node that only has a single core rated at 500 
YVCs because that core likely has enough horsepower to multitask the three 
threads and get them each the required performance.

I understand where [[email protected]] is coming from re: dangers of 
developing "one unit to rule them all", but I also think there needs to be 
*some* way to convey performance requirements.  Sandy's example shows that just 
because a job ran fine with one core on some box doesn't mean the job is going 
to run fine with one core on another.  We will not be able to develop a metric 
that will cover all the hardware architecture differences, but if a metric 
works in the vast majority of cases then I think that's a net win over no 
metric.

The APIs are already set for 2.1, and I believe the common case will be jobs 
where a single thread dominates the overall CPU request of the container.  In 
that sense, we can map the existing API call to a single vcore ask and add 
another API where the ask can be a list/array of vcore asks.  This could get 
complicated in the scheduler for an architecture where the effective vcore 
rating of the processors is not homogenous (brings up the spectre of 
processor-pinning and per-processor scheduling), but I don't think this will be 
a common architecture.
                
> Define a virtual core unambigiously
> -----------------------------------
>
>                 Key: YARN-1024
>                 URL: https://issues.apache.org/jira/browse/YARN-1024
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that 
> it's easy to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU: 
> http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
> equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to