[
https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291868#comment-16291868
]
Konstantinos Karanasos commented on YARN-7457:
----------------------------------------------
Hi guys, I was just looking into this JIRA and I wanted to share some thoughts.
First, I agree that the way we specify delay should be decoupled from the
scheduler implementation. So, there should be a pluggable policy that specifies
if I want my scheduler to, say, relax locality based on missed opportunities or
based on time.
I also see the need for each application to specify how "urgent" its requests
are. That is, one app might want to relax locality after 1 sec and another
might want to relax after 1 min.
What I do not see is use cases where one application will want to wait 5 missed
opportunities and another wait 10 secs.
I am not sure even what are the semantics of such a placement for the scheduler.
More importantly, missed opportunities is an implementation detail of the way
our schedulers work today, and is something that I think should not be surfaced
to applications. Missed opportunities do not mean anything to apps, and in
fact, like you say, the capacity scheduler interprets them differently from the
fair scheduler, which is yet another reason it should not be surfaced. On the
other hand, time is much more tangible and we could surface that.
To sum up, I think each application should specify its urgency of getting a
specific container, but there should be a common way to specify that urgency
across all applications, rather than having some specify missed opportunities
and others specify time. We can have a scheduler-wide pluggable policy
determining if the "urgency" will be time or missed opportunities, and then
have applications give a number, specifying how urgent their demand is. In a
first cut, this number can be directly translated to seconds or missed
opportunities by our schedulers.
Looping in [~subru] -- I just ran the above ideas by him. He also mentioned
that we should not surface more the missed opportunities. Moreover, he took a
step further the notion of urgency I mentioned above, saying that ultimately we
want applications to specify the utility of their requests as a cost function.
Thoughts?
> Delay scheduling should be an individual policy instead of part of scheduler
> implementation
> -------------------------------------------------------------------------------------------
>
> Key: YARN-7457
> URL: https://issues.apache.org/jira/browse/YARN-7457
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Tao Yang
>
> Currently, different schedulers have slightly different delay scheduling
> implementations. Ideally we should make delay scheduling independent from
> scheduler implementation. Benefits of doing this:
> 1) Applications can choose which delay scheduling policy to use, it could be
> time-based / missed-opportunistic-based or whatever new delay scheduling
> policy supported by the cluster. Now it is global config of scheduler.
> 2) Make scheduler implementations simpler and reusable.
> h2. {color:red}Running design doc:
> https://docs.google.com/document/d/1rY-CJPLbGk3Xj_8sxre61y2YkHJFK8oqKOshro1ZY3A/edit#heading=h.xnzvh9nn283a{color}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]