[
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476376#comment-16476376
]
Haibo Chen commented on YARN-8250:
----------------------------------
Thanks [~asuresh] for the response.
{quote} the whole point of opportunistic containers is to start them as fast as
possible
{quote}
I agree in general that we should start Opportunistic containers as fast as
possible, given they are more likely to be preempted anyway. The possible
container churn, however, can discourage many users of Oversubscription. One
design goal of oversubscription is to make it as seamlessly as possible that
users are willing to turn this on without worrying too much about the
possibility of many container/job failures. If we'd launch OPPORTUNISTIC
containers aggressively, many of them can be preempted shortly, now the
framework AMs need to be conscious about it and handle them differently (From
an AM's perspective, if an AM is opting in the feature, then the AM needs to be
prepared to handle much more frequent failures. It is natural to think twice
whether it want to opt in the feature).
Opting-in oversubscription is in a lot of ways to AMs like say I'm willing to
start a task eagerly in an Opportunistic container, but when the time comes
that I will be getting a Guaranteed container if I had not started early, I'd
be running the task in a GUARANTEED container from then on (scheduler
automatically promotes when the time comes). To provide a smooth experience,
what's implied is that Opportunistic container failures do not occur too often
to become a significant downside of opting-in oversubscription. We cannot never
avoid Opportunistic container failures, but in cases like this, we could be
less aggressive to minimize the failure. This specific goal of YARN-1011 also
makes it less suitable to turn on in clusters where the utilization is already
very high, IMO. I hope that makes sense.
The pause/resume feature does sound useful to avoid losing work. Does the AM
need to be aware of this, and what does AM do if a container is being paused
for a while?
I believe a container kill is done with a soft kill followed by a kill -9. In
case of over-allocating, we don't know exactly how many O containers to kill
because we only have resource request info for any given container, rather than
how much resources a running O container actually using. This is not a concern
when over-allocation is off. On one hand, we'd aggressive launch O containers,
on the other hand, we want to avoid preempting O containers as much as possible
as described previously. What we end up doing is that we'd kill one O container
at a time, a large G container can be sitting in the queue for more than a few
seconds if multiple containers need to be killed one by one. Again, it sounds
like the pause resume feature is useful here if we aggressive preempt O
containers.
> Create another implementation of ContainerScheduler to support NM
> overallocation
> --------------------------------------------------------------------------------
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch,
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler,
> and future tweak of over-allocation strategy based on how much containers
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch
> guaranteed containers immediately and queues opportunistic containers. It
> relies on a periodical check to launch opportunistic containers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]