[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069208#comment-15069208
 ] 

Arun Suresh commented on YARN-3870:
-----------------------------------

Thank you for starting this discussion [~grey]

Correct me if I am wrong, what you are proposing, I guess is some way for the 
Scheduler to co-relate the expanded Resource Requests. I do feel this would be 
genuinely useful, not only from a Scheduling perspective for eg. making 
affinity / anti-afinity scheduling decisions viz. YARN-1042. This will also 
greatly help improving pre-emption decisions in the FairScheduler viz. 
YARN-2154.. 

This would also be extremely useful for AMs too. Currently the MRAM does the 
book keeping and matches an allocated container to ResourceRequest. AMs can be 
generally relieved of this job if an allocated Container Token can easily be 
matched against a Resource Request.

One possible approach could be to have the AMClient generate a unique id for a 
Resource request and tag each of the expanded requests (Node, Rack and ANY) 
with this id. This Id can then be passed around in the 
Container/ContainerTokenIdentifier.

[~ka...@cloudera.com], [~vinodkv], [~leftnoteasy], Thoughts ?

> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>            Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to