[
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508382#comment-16508382
]
Haibo Chen commented on YARN-8250:
----------------------------------
My apologies, Arun. I did not mean to indicate in any way you are trying to
'block'.
{quote}what would 3) accomplish ? Again, point of opportunistic containers is
to have it start up as fast as possible
{quote}
The reason why we want to tune how fast opportunistic containers are launched
is that we'd like to minimize opportunistic container failures, and therefore
task failures, because of the fact that the node utilization is only updated
every few seconds, that is, opportunistic containers are launched and then
killed quickly afterwards.
yarn-1011 allows another option when there is no resources left un-allocated.
Users can either wait for guaranteed containers to run their tasks at some
point in the future, or start running in opportunistic containers early and
automatically promoted to guaranteed containers at the same point. But if users
experience more task failures, because the opportunistic containers in which
the tasks are still running are launched fast and killed fast, they'll less
likely to adopt yarn-1011. In that sense, being able to minimize opportunistic
container failures is critical.
That said, this behavior is only useful for yarn-1011. I totally agree that
when there is no over-allocation and there is capacity when a container
completes, there is no point of not starting opportunistic containers right
away, as you said.
{quote}If there is capacity at the time a container completes AND there are no
G containers waiting to start, why not start the first O container in queue ?
{quote}
For that reason, we initially proposed another implementation of container
scheduler. The proposal to change the existing container scheduler, as
discussed with [~leftnoteasy], was to explore the possibility of converging on
the behaviors and therefore avoiding two container scheduler implementations,
to address his previous concerns. But this does not seem a good thing to do,
given the discussions we've had so far.
{quote}I am guessing the point of the JIRA is to ensure G container startup
time is not impacted right ?
{quote}
The longer G container startup time is another consequence of too many
opportunistic containers being launched in the case of over-allocation. We
don't know how much opportunistic containers are actually consuming, so when we
need to launch a G container when there isn't unallocated resources left, what
we'll do is that we kill some opportunistic containers and check if more needs
to be kill later when they finish. We may end up with a few rounds like that in
some cases. Also, because the node utilization is stale, many opportunistic
containers may get killed unnecessarily. However, this is NOT an issue at all,
if there is no over-allocation.
Hope that explains the thinking behind our proposal of a different container
scheduler.
Can you please elaborate on this, [~asuresh] ? I don't quite understand this.
{quote}Wouldnt a simple approach be: Check if container is opportunistic, and
if container is to be killed and if over-allocation is turned on, assume
{{sleep-delay-before-sigkill.ms}} == 0
{quote}
> Create another implementation of ContainerScheduler to support NM
> overallocation
> --------------------------------------------------------------------------------
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch,
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler,
> and future tweak of over-allocation strategy based on how much containers
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch
> guaranteed containers immediately and queues opportunistic containers. It
> relies on a periodical check to launch opportunistic containers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]