[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082312#comment-15082312
 ] 

Subru Krishnan commented on YARN-3870:
--------------------------------------

Regarding the ID, I am in principle fine with asking the AM to set it. We do 
have the option of reusing the _responseID_ of *AllocateRequest* which both the 
RM and AM maintain today. It would be good to also link the _responseID_ to the 
actual allocated container in *AllocateResponse* as this is a useful hint for 
the AMs. In fact has been requested by [~markus.weimer] to simplify certain 
bookkeeping for the [REEF | http://reef.apache.org/ ] AM.

> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>            Reporter: Lei Guo
>            Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to