[
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087782#comment-16087782
]
Wangda Tan edited comment on YARN-6808 at 7/14/17 6:49 PM:
-----------------------------------------------------------
Thanks [~asuresh] for the additional explanations.
Frankly speaking, I'm a little bit worried about this direction.
Currently, opportunistic container allocation is too simple to follow what
existing schedulers can do. For example, delayed scheduling, user limit. You
might want to say that there's no guarantee for opportunistic container so we
don't have to follow all of them. This is tree under the context of YARN-2877,
which is target to run container with short life and don't need any guarantees.
However, if we move all containers beyond queue's configured capacity to queue
opportunistic, it is a big deal. I'm not sure if you have any plans regarding
to this.
One related topic is YARN-1011: Opportunistic containers will be used when node
is overcommitted. YARN-1013/YARN-1015 are opened to track changes in scheduler
side. To me it's better to reuse existing scheduler code instead of duplicating
code to achieve this. I'm not sure is there any concrete plans for this. cc:
[~haibo.chen], [~kasha].
If like you said, use this approach to replace existing preemption, I think
this patch is far away from the goal. Even with the latest patch, it can only
mark containers which are allocated below queue limit and which are allocated
above queue limit. (Which could be wrong if queue's configured capacity
changed). In that case, I don't know how it is possible to support existing
preemption features such as preemption large containers, intra-queue preemption
for app priority / user limit, etc.
bq. Essentially, the extra delay will look just like localization delay to the
AM. We have verified this is fine for MapReduce and Spark.
To me this is too simple to solve the problem. Yes existing MR/Spark are able
to expect delays when containers are localizing, but in most cases,
localization happens only once on each node because files need to be localized
are mostly common between containers from the same app (or same set of task
like mappers). It looks like in a large cluster, delays need to be expected for
every OC launch. This is not acceptable for SLA-sensitive jobs.
I don't want to stop or slow down this innovation, but from what I can see,
existing assumptions are plans need lots of time to be verified. In this case,
like YARN-1011, do you think is it better to create an umbrella JIRA (use
opportunistic container to do preemption). And move related works to branch?
Edit #1:
Probably the new umbrella should be: Use OC for normal containers. (YARN-2877
are not target to use it for normal containers).
was (Author: leftnoteasy):
Thanks [~asuresh] for the additional explanations.
Frankly speaking, I'm a little bit worried about this direction.
Currently, opportunistic container allocation is too simple to follow what
existing schedulers can do. For example, delayed scheduling, user limit. You
might want to say that there's no guarantee for opportunistic container so we
don't have to follow all of them. This is tree under the context of YARN-2877,
which is target to run container with short life and don't need any guarantees.
However, if we move all containers beyond queue's configured capacity to queue
opportunistic, it is a big deal. I'm not sure if you have any plans regarding
to this.
One related topic is YARN-1011: Opportunistic containers will be used when node
is overcommitted. YARN-1013/YARN-1015 are opened to track changes in scheduler
side. To me it's better to reuse existing scheduler code instead of duplicating
code to achieve this. I'm not sure is there any concrete plans for this. cc:
[~haibo.chen], [~kasha].
If like you said, use this approach to replace existing preemption, I think
this patch is far away from the goal. Even with the latest patch, it can only
mark containers which are allocated below queue limit and which are allocated
above queue limit. (Which could be wrong if queue's configured capacity
changed). In that case, I don't know how it is possible to support existing
preemption features such as preemption large containers, intra-queue preemption
for app priority / user limit, etc.
bq. Essentially, the extra delay will look just like localization delay to the
AM. We have verified this is fine for MapReduce and Spark.
To me this is too simple to solve the problem. Yes existing MR/Spark are able
to expect delays when containers are localizing, but in most cases,
localization happens only once on each node because files need to be localized
are mostly common between containers from the same app (or same set of task
like mappers). It looks like in a large cluster, delays need to be expected for
every OC launch. This is not acceptable for SLA-sensitive jobs.
I don't want to stop or slow down this innovation, but from what I can see,
existing assumptions are plans need lots of time to be verified. In this case,
like YARN-1011, do you think is it better to create an umbrella JIRA (use
opportunistic container to do preemption). And move related works to branch?
> Allow Schedulers to return OPPORTUNISTIC containers when queues go over
> configured capacity
> -------------------------------------------------------------------------------------------
>
> Key: YARN-6808
> URL: https://issues.apache.org/jira/browse/YARN-6808
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Attachments: YARN-6808.001.patch, YARN-6808.002.patch
>
>
> This is based on discussions with [~kasha] and [~kkaranasos].
> Currently, when a Queues goes over capacity, apps on starved queues must wait
> either for containers to complete or for them to be pre-empted by the
> scheduler to get resources.
> This JIRA proposes to allow Schedulers to:
> # Allocate all containers over the configured queue capacity/weight as
> OPPORTUNISTIC.
> # Auto-promote running OPPORTUNISTIC containers of apps as and when their
> GUARANTEED containers complete.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]