[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

Haibo Chen (JIRA) Mon, 11 Jun 2018 10:04:23 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508382#comment-16508382
 ]


Haibo Chen commented on YARN-8250:
----------------------------------

My apologies, Arun. I did not mean to indicate in any way you are trying to 
'block'.
{quote}what would 3) accomplish ? Again, point of opportunistic containers is 
to have it start up as fast as possible
{quote}
The reason why we want to tune how fast opportunistic containers are launched 
is that we'd like to minimize opportunistic container failures, and therefore 
task failures, because of the fact that the node utilization is only updated 
every few seconds, that is, opportunistic containers are launched and then 
killed quickly afterwards.

yarn-1011 allows another option when there is no resources left un-allocated. 
Users can either wait for guaranteed containers to run their tasks at some 
point in the future, or start running in opportunistic containers early and 
automatically promoted to guaranteed containers at the same point. But if users 
experience more task failures, because the opportunistic containers in which 
the tasks are still running are launched fast and killed fast, they'll less 
likely to adopt yarn-1011. In that sense, being able to minimize opportunistic 
container failures is critical.

That said, this behavior is only useful for yarn-1011. I totally agree that 
when there is no over-allocation and there is capacity when a container 
completes, there is no point of not starting opportunistic containers right 
away, as you said.
{quote}If there is capacity at the time a container completes AND there are no 
G containers waiting to start, why not start the first O container in queue ?
{quote}
For that reason, we initially proposed another implementation of container 
scheduler.  The proposal to change the existing container scheduler, as 
discussed with [~leftnoteasy], was to explore the possibility of converging on 
the behaviors and therefore avoiding two container scheduler implementations, 
to address his previous concerns. But this does not seem a good thing to do, 
given the discussions we've had so far.
{quote}I am guessing the point of the JIRA is to ensure G container startup 
time is not impacted right ?
{quote}
The longer G container startup time is another consequence of too many 
opportunistic containers being launched in the case of over-allocation. We 
don't know how much opportunistic containers are actually consuming, so when we 
need to launch a G container when there isn't unallocated resources left, what 
we'll do is that we kill some opportunistic containers and check if more needs 
to be kill later when they finish. We may end up with a few rounds like that in 
some cases.  Also, because the node utilization is stale, many opportunistic 
containers may get killed unnecessarily.  However, this is NOT an issue at all, 
if there is no over-allocation.

Hope that explains the thinking behind our proposal of a different container 
scheduler.

Can you please elaborate on this, [~asuresh] ? I don't quite understand this.
{quote}Wouldnt a simple approach be: Check if container is opportunistic, and 
if container is to be killed and if over-allocation is turned on, assume 
{{sleep-delay-before-sigkill.ms}} == 0
{quote}
 

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> --------------------------------------------------------------------------------
>
>                 Key: YARN-8250
>                 URL: https://issues.apache.org/jira/browse/YARN-8250
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
>         Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

Reply via email to