[
https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Kambatla updated YARN-1969:
---
Description:
What we are observing is that some big jobs with many allocated containers are
waiting for a few containers to finish. Under *fair-share scheduling* however
they have a low priority since there are other jobs (usually much smaller, new
comers) that are using resources way below their fair share, hence new released
containers are not offered to the big, yet close-to-be-finished job.
Nevertheless, everybody would benefit from an unfair scheduling that offers
the resource to the big job since the sooner the big job finishes, the sooner
it releases its many allocated resources to be used by other jobs.In other
words, we need a relaxed version of *Earliest Endtime First scheduling*, that
takes into account the number of already-allocated resources and estimated time
to finish.
For example, if a job is using MEM GB of memory and is expected to finish in
TIME minutes, the priority in scheduling would be a function p of (MEM, TIME).
The expected time to finish can be estimated by the AppMaster using
TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource
request messages. To be less susceptible to the issue of apps gaming the
system, we can have this scheduling limited to leaf queues which have
applications.
was:
What we are observing is that some big jobs with many allocated containers are
waiting for a few containers to finish. Under *fair-share scheduling* however
they have a low priority since there are other jobs (usually much smaller, new
comers) that are using resources way below their fair share, hence new released
containers are not offered to the big, yet close-to-be-finished job.
Nevertheless, everybody would benefit from an unfair scheduling that offers
the resource to the big job since the sooner the big job finishes, the sooner
it releases its many allocated resources to be used by other jobs.In other
words, what we require is a kind of variation of *Earliest Deadline First
scheduling*, that takes into account the number of already-allocated resources
and estimated time to finish.
http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling
For example, if a job is using MEM GB of memory and is expected to finish in
TIME minutes, the priority in scheduling would be a function p of (MEM, TIME).
The expected time to finish can be estimated by the AppMaster using
TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource
request messages. To be less susceptible to the issue of apps gaming the
system, we can have this scheduling limited to *only within a queue*: i.e.,
adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues to
use it by setting the schedulingPolicy field.
Fair Scheduler: Add policy for Earliest Endtime First
-
Key: YARN-1969
URL: https://issues.apache.org/jira/browse/YARN-1969
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
What we are observing is that some big jobs with many allocated containers
are waiting for a few containers to finish. Under *fair-share scheduling*
however they have a low priority since there are other jobs (usually much
smaller, new comers) that are using resources way below their fair share,
hence new released containers are not offered to the big, yet
close-to-be-finished job. Nevertheless, everybody would benefit from an
unfair scheduling that offers the resource to the big job since the sooner
the big job finishes, the sooner it releases its many allocated resources
to be used by other jobs.In other words, we need a relaxed version of
*Earliest Endtime First scheduling*, that takes into account the number of
already-allocated resources and estimated time to finish.
For example, if a job is using MEM GB of memory and is expected to finish in
TIME minutes, the priority in scheduling would be a function p of (MEM,
TIME). The expected time to finish can be estimated by the AppMaster using
TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource
request messages. To be less susceptible to the issue of apps gaming the
system, we can have this scheduling limited to leaf queues which have
applications.
--
This message was sent by Atlassian JIRA
(v6.2#6252)