[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074755#comment-15074755
 ] 

Xianyin Xin commented on YARN-3870:
-----------------------------------

+1 for adding an unique id for a resource request, but i would suggest we 
consider these kind of problems in a more systematic way, considering YARN-314, 
YARN-1042, YARN-371, YARN-4485 and this.

Like my comment in YARN-314, a natural way the scheduler works should like a 
factory, it receives orders, and prepare for that. Once we accept the work 
philosophy, we'll find it's natural and necessary for a resource order has the 
following dimensions
1. order id, which can identify an order, and can get overdue, or has a time 
limit;
2. priority;
3. a collection of request unit, each specifies a kind of resource request,that 
should have a coordinate of <ResourceName/nodeLabels, Capability, 
NumOfContainers>;
4. relaxLocality;
5. canbeDecomposed, or ifGangScheduling;
6. ...
Scheduler do scheduling based on order form, and should not swallow any 
information passed from the app.

Any thoughts?

> Providing raw container request information for fine scheduling
> ---------------------------------------------------------------
>
>                 Key: YARN-3870
>                 URL: https://issues.apache.org/jira/browse/YARN-3870
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>            Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to