[ 
https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1969:
-----------------------------------

    Description: 
What we are observing is that some big jobs with many allocated containers are 
waiting for a few containers to finish. Under *fair-share scheduling* however 
they have a low priority since there are other jobs (usually much smaller, new 
comers) that are using resources way below their fair share, hence new released 
containers are not offered to the big, yet close-to-be-finished job. 
Nevertheless, everybody would benefit from an "unfair" scheduling that offers 
the resource to the big job since the sooner the big job finishes, the sooner 
it releases its "many" allocated resources to be used by other jobs.In other 
words, we need a relaxed version of *Earliest Endtime First scheduling*, that 
takes into account the number of already-allocated resources and estimated time 
to finish.

For example, if a job is using MEM GB of memory and is expected to finish in 
TIME minutes, the priority in scheduling would be a function p of (MEM, TIME). 
The expected time to finish can be estimated by the AppMaster using 
TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource 
request messages. To be less susceptible to the issue of apps gaming the 
system, we can have this scheduling limited to leaf queues which have 
applications.

  was:
What we are observing is that some big jobs with many allocated containers are 
waiting for a few containers to finish. Under *fair-share scheduling* however 
they have a low priority since there are other jobs (usually much smaller, new 
comers) that are using resources way below their fair share, hence new released 
containers are not offered to the big, yet close-to-be-finished job. 
Nevertheless, everybody would benefit from an "unfair" scheduling that offers 
the resource to the big job since the sooner the big job finishes, the sooner 
it releases its "many" allocated resources to be used by other jobs.In other 
words, what we require is a kind of variation of *Earliest Deadline First 
scheduling*, that takes into account the number of already-allocated resources 
and estimated time to finish.
http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling

For example, if a job is using MEM GB of memory and is expected to finish in 
TIME minutes, the priority in scheduling would be a function p of (MEM, TIME). 
The expected time to finish can be estimated by the AppMaster using 
TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource 
request messages. To be less susceptible to the issue of apps gaming the 
system, we can have this scheduling limited to *only within a queue*: i.e., 
adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues to 
use it by setting the "schedulingPolicy" field.


> Fair Scheduler: Add policy for Earliest Endtime First
> -----------------------------------------------------
>
>                 Key: YARN-1969
>                 URL: https://issues.apache.org/jira/browse/YARN-1969
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>
> What we are observing is that some big jobs with many allocated containers 
> are waiting for a few containers to finish. Under *fair-share scheduling* 
> however they have a low priority since there are other jobs (usually much 
> smaller, new comers) that are using resources way below their fair share, 
> hence new released containers are not offered to the big, yet 
> close-to-be-finished job. Nevertheless, everybody would benefit from an 
> "unfair" scheduling that offers the resource to the big job since the sooner 
> the big job finishes, the sooner it releases its "many" allocated resources 
> to be used by other jobs.In other words, we need a relaxed version of 
> *Earliest Endtime First scheduling*, that takes into account the number of 
> already-allocated resources and estimated time to finish.
> For example, if a job is using MEM GB of memory and is expected to finish in 
> TIME minutes, the priority in scheduling would be a function p of (MEM, 
> TIME). The expected time to finish can be estimated by the AppMaster using 
> TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource 
> request messages. To be less susceptible to the issue of apps gaming the 
> system, we can have this scheduling limited to leaf queues which have 
> applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to