[ 
https://issues.apache.org/jira/browse/YARN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570595#comment-13570595
 ] 

Sandy Ryza commented on YARN-371:
---------------------------------

Arun,

No offense taken!  I realize this discussion isn't likely to reach a quick 
resolution, so I'd be happy to move it to yarn-dev if that would be a better 
forum.  I'll also change the name of the JIRA to reflect the more general issue 
and not a specific solution.

You make very good point.  However, as Bobby points out, I don't think that 
changing the protocol requires the shift that you describe inside the 
ResourceManager. The scheduler is perfectly able to translate requests as 
described above into the current model.  The change would merely allow the 
model to be determined by the scheduler, which is both pluggable and can change 
in future versions, instead of the protocol, which must remain stable.  The 
change would allow future flexibility on the tradeoff between the amount of 
state that needs to be stored and the exactness of the scheduling.  For this 
JIRA I am only suggesting changing the protocol, not making any structural 
changes to the schedulers.

To be clear, I am not suggesting that a separate ResourceRequest is required 
for each task.  If a number of tasks are requested with only a rack-level 
locality, by all means they should be bundled into the same request.  That 
said, it is true that in many cases, the amount of data in a request will 
increase.  Assuming a compact representation that uses 14 bytes per task, an 
application requesting a hundred thousand tasks, each on a unique set of three 
nodes, would be sending 1.5 MB over the wire.  If this is a prohibitively large 
amount, then we could possibly allow for different request formats?  All 
schedulers would be required to accept requests both in the "resource-centric" 
and "task-centric" formats, and, for schedulers that are internally 
resource-centric, a utility would be provided for converting task-centric 
requests to resource-centric ones.  I realize that this makes things hairier, 
but I believe that, in addition to the advantages it can provide to present-day 
MR, such as more accurate delay scheduling, down the line, not supporting 
"lossless" requests will severely limit the types of applications that can be 
handled by YARN.
                
> Consolidate resource requests in AM-RM heartbeat
> ------------------------------------------------
>
>                 Key: YARN-371
>                 URL: https://issues.apache.org/jira/browse/YARN-371
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, resourcemanager, scheduler
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Each AMRM heartbeat consists of a list of resource requests. Currently, each 
> resource request consists of a container count, a resource vector, and a 
> location, which may be a node, a rack, or "*". When an application wishes to 
> request a task run in multiple localtions, it must issue a request for each 
> location.  This means that for a node-local task, it must issue three 
> requests, one at the node-level, one at the rack-level, and one with * (any). 
> These requests are not linked with each other, so when a container is 
> allocated for one of them, the RM has no way of knowing which others to get 
> rid of. When a node-local container is allocated, this is handled by 
> decrementing the number of requests on that node's rack and in *. But when 
> the scheduler allocates a task with a node-local request on its rack, the 
> request on the node is left there.  This can cause delay-scheduling to try to 
> assign a container on a node that nobody cares about anymore.
> Additionally, unless I am missing something, the current model does not allow 
> requests for containers only on a specific node or specific rack. While this 
> is not a use case for MapReduce currently, it is conceivable that it might be 
> something useful to support in the future, for example to schedule 
> long-running services that persist state in a particular location, or for 
> applications that generally care less about latency than data-locality.
> Lastly, the ability to understand which requests are for the same task will 
> possibly allow future schedulers to make more intelligent scheduling 
> decisions, as well as permit a more exact understanding of request load.
> I would propose the tweak of allowing a single ResourceRequest to encapsulate 
> all the location information for a task.  So instead of just a single 
> location, a ResourceRequest would contain an array of locations, including 
> nodes that it would be happy with, racks that it would be happy with, and 
> possibly *.  Side effects of this change would be a reduction in the amount 
> of data that needs to be transferred in a heartbeat, as well in as the RM's 
> memory footprint, becaused what used to be different requests for the same 
> task are now able to share some common data.
> While this change breaks compatibility, if it is going to happen, it makes 
> sense to do it now, before YARN becomes beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to