[
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729895#comment-13729895
]
Jason Lowe commented on YARN-1024:
----------------------------------
Agree that the example posed by [~sandyr] shows that a single unit in the
request cannot properly convey the ask. Chatted briefly about this offline
with [~revans2] and [~nroberts] and we think in general there needs to be a way
to show the parallelism needed along with some performance guarantee from those
threads. That basically leads us to a path where in the generalized case we're
asking for a list of vcore units, where the number of entries in the list
represents the desired hardware parallelism and the value of each entry
represents the performance needed for that execution thread.
Using this with Sandy's example, asking for a single unit of 250 YVCs means it
would not be allocated on the node with three cores each rated at 150 YVCs
because none of the cores meets the single-threaded performance needed by the
container. If another job came along and asked for three cores each at 100
YVCs, that could still run on a node that only has a single core rated at 500
YVCs because that core likely has enough horsepower to multitask the three
threads and get them each the required performance.
I understand where [[email protected]] is coming from re: dangers of
developing "one unit to rule them all", but I also think there needs to be
*some* way to convey performance requirements. Sandy's example shows that just
because a job ran fine with one core on some box doesn't mean the job is going
to run fine with one core on another. We will not be able to develop a metric
that will cover all the hardware architecture differences, but if a metric
works in the vast majority of cases then I think that's a net win over no
metric.
The APIs are already set for 2.1, and I believe the common case will be jobs
where a single thread dominates the overall CPU request of the container. In
that sense, we can map the existing API call to a single vcore ask and add
another API where the ask can be a list/array of vcore asks. This could get
complicated in the scheduler for an architecture where the effective vcore
rating of the processors is not homogenous (brings up the spectre of
processor-pinning and per-processor scheduling), but I don't think this will be
a common architecture.
> Define a virtual core unambigiously
> -----------------------------------
>
> Key: YARN-1024
> URL: https://issues.apache.org/jira/browse/YARN-1024
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that
> it's easy to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU:
> http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the
> equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira