[
https://issues.apache.org/jira/browse/YARN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570325#comment-13570325
]
Arun C Murthy commented on YARN-371:
------------------------------------
Sandy - please don't let this side discussion distract you, it's an individual
style thing. I use "-1" on grocery list discussions with my wife...
unfortunately I don't have the luxury of vetos in that context! *smile*
Anyway, there are other good discussion points such as 'allowing requests on a
specific node/rack' which I have pondered about for a long while too. Maybe we
can close this jira and open one for specific enhancements?
----
In future, it would help if jira descriptions are short and propose a specific
enhancement - this way we can debate solutions separately (maybe even on *-dev
list).
On the plus side, this way I can "-1" a specific implementation proposal rather
than the jira too... ;-)
> Consolidate resource requests in AM-RM heartbeat
> ------------------------------------------------
>
> Key: YARN-371
> URL: https://issues.apache.org/jira/browse/YARN-371
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: api, resourcemanager, scheduler
> Affects Versions: 2.0.2-alpha
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
>
> Each AMRM heartbeat consists of a list of resource requests. Currently, each
> resource request consists of a container count, a resource vector, and a
> location, which may be a node, a rack, or "*". When an application wishes to
> request a task run in multiple localtions, it must issue a request for each
> location. This means that for a node-local task, it must issue three
> requests, one at the node-level, one at the rack-level, and one with * (any).
> These requests are not linked with each other, so when a container is
> allocated for one of them, the RM has no way of knowing which others to get
> rid of. When a node-local container is allocated, this is handled by
> decrementing the number of requests on that node's rack and in *. But when
> the scheduler allocates a task with a node-local request on its rack, the
> request on the node is left there. This can cause delay-scheduling to try to
> assign a container on a node that nobody cares about anymore.
> Additionally, unless I am missing something, the current model does not allow
> requests for containers only on a specific node or specific rack. While this
> is not a use case for MapReduce currently, it is conceivable that it might be
> something useful to support in the future, for example to schedule
> long-running services that persist state in a particular location, or for
> applications that generally care less about latency than data-locality.
> Lastly, the ability to understand which requests are for the same task will
> possibly allow future schedulers to make more intelligent scheduling
> decisions, as well as permit a more exact understanding of request load.
> I would propose the tweak of allowing a single ResourceRequest to encapsulate
> all the location information for a task. So instead of just a single
> location, a ResourceRequest would contain an array of locations, including
> nodes that it would be happy with, racks that it would be happy with, and
> possibly *. Side effects of this change would be a reduction in the amount
> of data that needs to be transferred in a heartbeat, as well in as the RM's
> memory footprint, becaused what used to be different requests for the same
> task are now able to share some common data.
> While this change breaks compatibility, if it is going to happen, it makes
> sense to do it now, before YARN becomes beta.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira