[
https://issues.apache.org/jira/browse/YARN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572374#comment-13572374
]
Tom White commented on YARN-371:
--------------------------------
Glad this turned into a good debate :)
On the task-centric vs resource-centric approach - I agree with Arun and
Bobby's points. Supporting two protocols is not to be done lightly, and we'd
need a strong motivating use case. Perhaps there is one, but the change may not
be something we need to do before 2.0 goes GA.
The restrictions on how you specify locations for resource requests are
interesting. You have to say "node 1, rack 1, *" - it's not sufficient to say
"node 1" or "rack 1" or even "node 1, rack 1". The schedulers use the * (ALL)
request to calculate the total resources for an application, but could they
support the ones without ALL with no protocol changes? E.g. calculate the total
resources for an app by summing the lower-level requests? Perhaps this could be
explored in another JIRA.
More generally, the request styles that Bobby mentions like gang scheduling,
any rack placement (where containers are on the same rack for locality, but it
doesn't matter which one), one-per node placement (for e.g. HBase) - are all
currently beyond the current system. How well YARN supports non-MR workloads
will be determined to some extent by how well it can support these request
styles. Perhaps we should simply make sure that the API is flexible enough to
accommodate changes. I see ResourceRequest is an abstract class so it would be
possible to add a method in the future in a compatible way to support optional
extra information that the scheduler might be able to use. Is that sufficient,
or should some of the YARN APIs be downgraded from @Stable to provide an option
to change them to support alternative request styles in the 2.x timeframe?
> Resource-centric compression in AM-RM protocol limits scheduling
> ----------------------------------------------------------------
>
> Key: YARN-371
> URL: https://issues.apache.org/jira/browse/YARN-371
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: api, resourcemanager, scheduler
> Affects Versions: 2.0.2-alpha
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
>
> Each AMRM heartbeat consists of a list of resource requests. Currently, each
> resource request consists of a container count, a resource vector, and a
> location, which may be a node, a rack, or "*". When an application wishes to
> request a task run in multiple localtions, it must issue a request for each
> location. This means that for a node-local task, it must issue three
> requests, one at the node-level, one at the rack-level, and one with * (any).
> These requests are not linked with each other, so when a container is
> allocated for one of them, the RM has no way of knowing which others to get
> rid of. When a node-local container is allocated, this is handled by
> decrementing the number of requests on that node's rack and in *. But when
> the scheduler allocates a task with a node-local request on its rack, the
> request on the node is left there. This can cause delay-scheduling to try to
> assign a container on a node that nobody cares about anymore.
> Additionally, unless I am missing something, the current model does not allow
> requests for containers only on a specific node or specific rack. While this
> is not a use case for MapReduce currently, it is conceivable that it might be
> something useful to support in the future, for example to schedule
> long-running services that persist state in a particular location, or for
> applications that generally care less about latency than data-locality.
> Lastly, the ability to understand which requests are for the same task will
> possibly allow future schedulers to make more intelligent scheduling
> decisions, as well as permit a more exact understanding of request load.
> I would propose the tweak of allowing a single ResourceRequest to encapsulate
> all the location information for a task. So instead of just a single
> location, a ResourceRequest would contain an array of locations, including
> nodes that it would be happy with, racks that it would be happy with, and
> possibly *. Side effects of this change would be a reduction in the amount
> of data that needs to be transferred in a heartbeat, as well in as the RM's
> memory footprint, becaused what used to be different requests for the same
> task are now able to share some common data.
> While this change breaks compatibility, if it is going to happen, it makes
> sense to do it now, before YARN becomes beta.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira