[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076642#comment-15076642
 ] 

Arun Suresh commented on YARN-3870:
---

+1 to having the AM create the ID (which I guess is required if the AM is to 
co-relate a resourcereq with a container response). Can we set the ID that is 
something of a combination of {{app_attempt_id + seq_id}}, where seq_id is 
incremented monotonically per app attempt ? to distinguish between different 
app attempts.

Also, I guess the outstanding requests for an app needs to be stored in the 
state store so subsequent app attempts can be intimated of unfullfulled reqs


> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-3870:
--

Assignee: Karthik Kambatla

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076629#comment-15076629
 ] 

Karthik Kambatla commented on YARN-3870:


bq. Karthik Kambatla, I am thinking whether we also need some update for the 
response part to correlate it with the ResourceRequest ID. As the scheduling is 
asynchronous, AM will also need to know the relation between response and 
request.

bq. If the AM doesn't add an ID, the RM could add one. Or, we could have the RM 
add the IDs and return them to the AM for help with book keeping.

Thought more about this. Since one AllocateRequest could have multiple 
ResourceRequests, the protocol becomes quite complicated if the RM creates an 
ID instead of the AM. How about we expect the AM to set this ID? If the AM 
doesn't set, we treat the requests the same way we do today (ID = -1). In the 
AllocateResponse, the RM could send the last received ResourceRequest. The AM 
could look at this ACK to see if it has to resend the requests? The AMRMClient 
and MR-AM could be updated to do this. 

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-397) RM Scheduler api enhancements

2016-01-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076741#comment-15076741
 ] 

Karthik Kambatla commented on YARN-397:
---

I believe the intent of this umbrella JIRA was to keep track of all scheduler 
API changes before Yarn went beta. Would it make more sense to go through 
individual JIRAs listed here, convert them to issues and close this umbrella 
JIRA? [~acmurthy], [~vinodkv]? 

> RM Scheduler api enhancements
> -
>
> Key: YARN-397
> URL: https://issues.apache.org/jira/browse/YARN-397
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>
> Umbrella jira tracking enhancements to RM apis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2016-01-02 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2885:
--
Attachment: YARN-2885-yarn-2877.v5.patch

Updating patch with javadocs and addressing some of [~leftnoteasy]'s suggestions

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, 
> YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076743#comment-15076743
 ] 

Karthik Kambatla commented on YARN-3870:


Was fleshing this out further. The number of IDs and hence ResourceRequests 
could be O(num. outstanding containers) which could pose problems as outlined 
in YARN-371. In fact, this JIRA is a duplicate of YARN-371: may be, we should 
close this and continue the discussion there. 

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076746#comment-15076746
 ] 

Arun Suresh commented on YARN-3870:
---

Hmmm... not sure entirely sure the argument holds, or I might be missing 
something. For eg. even without an ID, what happens today if an application 
makes each request with a different Priority ?

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076750#comment-15076750
 ] 

Karthik Kambatla commented on YARN-1011:


Forgot to respond to one comment:

bq. when terminating opportunistic containers will the RM ask the AM about 
which containers to kill?
Don't think we should. NM --> RM --> AM --> NM is a long communication thread. 
Our preemption should kick in much faster that that. What do you think of 
preempting the last opportunistic container that was started, since it is 
likely that far away from promotion. 



> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076749#comment-15076749
 ] 

Karthik Kambatla commented on YARN-1011:


Thanks for chiming in, [~bikassaha]. 

bq. It is essential to run opportunistic tasks at lower OS cpu priority so that 
they never obstruct progress of normal tasks.
bq. In fact, this is the litmus test for opportunistic scheduling.
Good point. Guaranteed containers should get priority for resources: 
Opportunistic containers should only use left-over resources. We should do this 
for CPU, disk and network. I am not aware of the latest on disk and network 
isolation, but we should create sub-tasks for those too. /cc [~vvasudev] 

bq. Handling opportunistic tasks raises questions on the involvement of the AMs.
bq. In that sense it would be instructive to consider opportunistic scheduling 
in a similar light as preemption.

I wasn't sure the AM needs to know a container's execution type:

As you mention, this is very similar to preemption. From an AM's standpoint, 
the container would be preempted if those resources are not available to that 
application any more. In case of preemption, this can happen if other high 
priority queues have outstanding demand or the cluster lost a couple of nodes. 
Here, it is possible Guaranteed containers actually need the resources.  In 
that sense, the AM doesn't have to do anything different for Guaranteed vs 
Opportunistic containers.

Predictability: Allowing applications to specify only Guaranteed containers vs 
Guaranteed or Opportunistic containers should take care of this. However, 
between getting no resources and getting opportunistic resources, are there 
cases where the applications prefer the latter? The applications "should" get 
guaranteed containers at the same point in time irrespective of whether they 
use opportunistic resources in the interim. Note that allowing applications to 
specify whether they are okay with getting opportunistic containers complicates 
the scheduling - the scheduler needs to look through the higher priority apps 
that don't allow opportunistic containers before getting to those that need. 
And, when resources are available on that node, the RM will need to schedule 
containers for higher priority apps prolonging the duration for which 
opportunistic containers stay opportunistic. 

Given this complication, I would prefer we do not involve AMs in the 
decision-making process. Based on the need and usecases, we could revisit this 
at a later time. Note that YARN-4335 adds this to ResourceRequest for 
distributed scheduling, and even there they are not entirely sure if it needs 
to be a part. 

bq. does the AM need to know that a newly allocated container was 
opportunistic. E.g. so that it does not schedule the highest priority work on 
that container.
Valid concern. May be, we should intimate the AM of whether a container is 
opportunistic, and later when it gets promoted to guaranteed. That said, I am 
not sure if this is essential to oversubscription being useful. Thoughts on 
punting it to Phase-2? 

bq. will opportunistic containers be given only when for containers that are 
beyond queue capacity such that we dont break any guarantees on their 
liveliness. ie. an AM will not expect to lose any container that is within its 
queue capacity but opportunistic containers can be killed at any time.
Yes. This probably needs to be clear in the doc. Will update it. 

bq. will conversion of opportunistic containers to regular containers be 
automatically done by the RM? 
By some combination of RM/NM, definitely yes. Initially, I thought the RM can 
be the only one doing this. The RM could keep track of opportunistic containers 
in SchedulerNode. Today, we already track launchedContainers. The scheduler 
could go through this list and promote containers before allocating new 
containers. 

Does this add an unnecessary delay in the promotion though? If the scheduler 
allocated opportunistic containers based on the same prioritization it uses for 
guaranteed containers, can the NM just promote the oldest opportunistic 
container running on that node and update the RM accordingly? 

Another thing to consider here: the promotion process here should work with 
that in YARN-2877. [~subru], [~kkaranasos], [~asuresh] - is it okay for the NM 
to automatically promote some opportunistic containers. May be, we could add a 
flag to the launch context to differentiate between those opportunistic 
containers that can be automatically promoted vs those that can not be. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New 

[jira] [Created] (YARN-4531) org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted java.lang.InterruptedException

2016-01-02 Thread SHYAM RAMATH (JIRA)
SHYAM RAMATH created YARN-4531:
--

 Summary: org.apache.hadoop.yarn.event.AsyncDispatcher: 
AsyncDispatcher thread interrupted java.lang.InterruptedException
 Key: YARN-4531
 URL: https://issues.apache.org/jira/browse/YARN-4531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.7.1
 Environment: Windows environment
Reporter: SHYAM RAMATH
Priority: Trivial


The error occured while executing the Hadoop Job : 
org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread 
interrupted java.lang.InterruptedException -



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15079334#comment-15079334
 ] 

Karthik Kambatla commented on YARN-3870:


[~leftnoteasy] - agree that YARN-4485 doesn't necessarily need ID-ing all 
requests corresponding to one task. 

The JIRAs are related only due to the data-structures in AppSchedulingInfo:
# If we add IDs as discussed here, we don't need any other data-structure 
changes except adding a timestamp to each ResourceRequest. 
# If we don't add IDs, we might need to store the resource-requests as 
Map Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076775#comment-15076775
 ] 

Wangda Tan commented on YARN-3870:
--

[~kasha],

bq. Was fleshing this out further. The number of IDs and hence ResourceRequests 
could be O(num. outstanding containers) which could pose problems as outlined 
in YARN-371. In fact, this JIRA is a duplicate of YARN-371: may be, we should 
close this and continue the discussion there.

As I mentioned above, I think we shouldn't combine YARN-371 and YARN-4485 
together: YARN-4485 is more like an internal change of scheduler to me:

Let's say an AM originally requests 1000 container (T1), then AM requests 1200 
containers (T2), then after scheduler allocated 100 containers, AM requests 
1200 containers again (T3).

For the original request, scheduler records: T1, 1000.
After T2, scheduler records: T1, 1000; T2, 200.
After T3, scheduler records: T1, 900 (scheduler allocates 100 containers); T2, 
200; T3, 100.

Instead recording timestamps for all resource requests, AM only needs to record 
timestamp to #pending-requests. And scheduler will "dequeue" from the timestamp 
to #pending-requests (sorted by time) when container allocated.

Like what you said, it will be hard to ask AM to set the ID, but scheduler 
should easily set it. But this solution needs more work if we want to save 
these timestamps when RM restart.

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)