[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-06-14 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512924#comment-16512924
 ] 

Haibo Chen commented on YARN-8250:
--

Thanks [~kkaranasos] for your comments and suggestions!
{quote}I understand that this might lead to more opportunistic containers 
killed than might be needed, but for sure it will be a good first version (and 
much better than what we have today). At the same time, if we are not very 
aggressive with starting opportunistic containers, it should not lead to too 
many containers killed.
{quote}
Indeed. This is less of an issue if we can be less aggressive to start 
opportunistic containers.
{quote}Why not keep existing behavior and add the new one too?
{quote}
Sure. Starting opportunistic containers at container schedule event is very 
much similar to starting opportunistic containers at container finish event, so 
I'd make the change as you suggested for both container finish and schedule 
event. This does allow us to tune how opportunistic containers are launched 
without introducing a different implementation.

We'll proceed to experiment with this idea in YARN-8427, and open jiras in the 
future if needed for any issue left unaddressed.

Thanks [~asuresh], [~leftnoteasy] [~kkaranasos] for sharing your thoughts, 
inputs and reviews!

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-06-12 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510299#comment-16510299
 ] 

Konstantinos Karanasos commented on YARN-8250:
--

Just went through your comments. I am skeptical about adding a new 
implementation of the Container Scheduler, as [~asuresh] and [~leftnoteasy] 
also pointed out.

It seems we can achieve the required behavior just with some new 
policies/functions rather than a new implementation.

>From what I understand, the main points are (1) how to start guaranteed 
>containers (do we start them and then kill opportunistic, or do we kill 
>opportunistic and then start them) and (2) how/when to start opportunistic 
>containers (at regular intervals or when another container finishes).

*How to start guaranteed:*

For (1), just to make sure I understand: why can't we know the resources 
utilized by each opportunistic container? But even if we don't know, can't we 
just use the allocated resources of the container in our calculation? I 
understand that this might lead to more opportunistic containers killed than 
might be needed, but for sure it will be a good first version (and much better 
than what we have today). At the same time, if we are not very aggressive with 
starting opportunistic containers, it should not lead to too many containers 
killed.

So my take is to have a first version that does the starting as it is today 
(first queue, then kill), even if that leads to more opportunistic containers 
killed. And be less aggressive in starting opportunistic for that matter.

*How to start opportunistic:*

Why not keep existing behavior and add the new one too? That is, we still let 
opportunistic start when there is a container finish event, but do so not with 
utilized but allocated resources. This will keep the current behavior. On top 
of that, if over-allocation is enabled, we can have a thread that every so 
often checks utilized resources and starts opportunistic containers on 
over-allocated resources.

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-06-11 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508385#comment-16508385
 ] 

Haibo Chen commented on YARN-8250:
--

In the meantime, I'll try to see how well YARN-6675 works with some testing.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-06-11 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508382#comment-16508382
 ] 

Haibo Chen commented on YARN-8250:
--

My apologies, Arun. I did not mean to indicate in any way you are trying to 
'block'.
{quote}what would 3) accomplish ? Again, point of opportunistic containers is 
to have it start up as fast as possible
{quote}
The reason why we want to tune how fast opportunistic containers are launched 
is that we'd like to minimize opportunistic container failures, and therefore 
task failures, because of the fact that the node utilization is only updated 
every few seconds, that is, opportunistic containers are launched and then 
killed quickly afterwards.

yarn-1011 allows another option when there is no resources left un-allocated. 
Users can either wait for guaranteed containers to run their tasks at some 
point in the future, or start running in opportunistic containers early and 
automatically promoted to guaranteed containers at the same point. But if users 
experience more task failures, because the opportunistic containers in which 
the tasks are still running are launched fast and killed fast, they'll less 
likely to adopt yarn-1011. In that sense, being able to minimize opportunistic 
container failures is critical.

That said, this behavior is only useful for yarn-1011. I totally agree that 
when there is no over-allocation and there is capacity when a container 
completes, there is no point of not starting opportunistic containers right 
away, as you said.
{quote}If there is capacity at the time a container completes AND there are no 
G containers waiting to start, why not start the first O container in queue ?
{quote}
For that reason, we initially proposed another implementation of container 
scheduler.  The proposal to change the existing container scheduler, as 
discussed with [~leftnoteasy], was to explore the possibility of converging on 
the behaviors and therefore avoiding two container scheduler implementations, 
to address his previous concerns. But this does not seem a good thing to do, 
given the discussions we've had so far.
{quote}I am guessing the point of the JIRA is to ensure G container startup 
time is not impacted right ?
{quote}
The longer G container startup time is another consequence of too many 
opportunistic containers being launched in the case of over-allocation. We 
don't know how much opportunistic containers are actually consuming, so when we 
need to launch a G container when there isn't unallocated resources left, what 
we'll do is that we kill some opportunistic containers and check if more needs 
to be kill later when they finish. We may end up with a few rounds like that in 
some cases.  Also, because the node utilization is stale, many opportunistic 
containers may get killed unnecessarily.  However, this is NOT an issue at all, 
if there is no over-allocation.

Hope that explains the thinking behind our proposal of a different container 
scheduler.

Can you please elaborate on this, [~asuresh] ? I don't quite understand this.
{quote}Wouldnt a simple approach be: Check if container is opportunistic, and 
if container is to be killed and if over-allocation is turned on, assume 
{{sleep-delay-before-sigkill.ms}} == 0
{quote}
 

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-06-08 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506607#comment-16506607
 ] 

Arun Suresh commented on YARN-8250:
---

Apologize for the late reply.

{quote}
3) Upon any container completed or finished event, do not try to launch any 
container.

4) Introduce a periodic check (in ContainersMonitor thread) that launches 
OPPORTUNISTIC container. Ideally, the period is configurable so that the 
latency to launch OPPORTUNISTIC containers can be reduced.
{quote}

My only issue is, what would 3) accomplish ? Again, point of opportunistic 
containers is to have it start up as fast as possible. If there is capacity at 
the time a container completes AND there are no G containers waiting to start, 
why not start the first O container in queue ?

Forgive me if my understanding is a bit off, but I am guessing the point of the 
JIRA is to ensure G container startup time is not impacted right ? Wouldnt a 
simple approach be: Check if container is opportunistic, and if container is to 
be killed and if over-allocation is turned on, assume 
{{sleep-delay-before-sigkill.ms}} == 0. This will ensure 'kill -9' is called 
immediately.

Please do not assume I am trying to 'block' via arbitrary argument :) I am just 
saying, we probably need a more data-driven approach while making changes. 
[~haibochen], can you provide some numbers to demonstrate the container 
start-time deterioration ? Just wanted to know how much of an issue it is. In 
our clusters, we pause containers and we've never had much of a problem with G 
container startup times, because of the queuing.

[~kkaranasos], thoughts ?

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-06-08 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506296#comment-16506296
 ] 

Haibo Chen commented on YARN-8250:
--

Ping [~asuresh]. 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496872#comment-16496872
 ] 

Haibo Chen commented on YARN-8250:
--

[~asuresh], [~leftnoteasy] and I had an offline discussion about this again. 

We think one alternative to avoid two different implementations of the 
container scheduler is to modify the behavior of the existing 
ContainerScheduler to accommodate the requirements of NM over-allocation. 
Specifically, the behavior changes of the current ContainerScheduler will 
include

Before: 

1) Upon a GUARANTEED container scheduling event, always queue the GUARANTEED 
container first and then check if any OPPORTUNISTIC container needs to be 
preempted. If so, wait for the OPPORTUNISTIC container(s) to be killed. 
Otherwise, launch the GUARANTEED container.

2) Upon an OPPORTUNISTIC container scheduling event, queue the container first 
and only launch the OPPORTUNISTIC container if there is enough room.

3) Upon any container completed or finished event that signals resources that 
have been released, check if any container (GUARANTEED containers first, then 
OPPORTUNISTIC containers) can be launched

After:

1) Upon a GUARANTEED container scheduling event, launch the GUARANTEED 
container immediately (without queuing). Rely on cgroups OOM control 
(YARN-6677) to preempt OPPORTUNISTIC containers as necessary.

2) Upon an OPPORTUNISTIC container scheduling event, simply queue the 
OPPORTUNISTIC container. 

3) Upon any container completed or finished event, do not try to launch any 
container.

4) Introduce a periodic check (in ContainersMonitor thread) that launches 
OPPORTUNISTIC container. Ideally, the period is configurable so that the 
latency to launch OPPORTUNISTIC containers can be reduced.

As we have discussed in previous comments, this reduces the latency to launch 
GUARANTEED containers and allow us to control how aggressive OPPORTUNISTIC 
containers are launched, which is especially important for reliability when 
over-allocation is turned on. The code can be a lot simpler as well.

*But it does increase the latency to launch OPPORTUNISTIC containers in cases 
where over-allocation is not on, because we give up opportunities to launch 
them when there are containers finished or paused*. In addition, it does add a 
dependency on cgroup OOM control to preempt OPPORTUNISTIC containers, even 
though I'd argue it's best to turn on cgroup isolation anyway to ensure 
GUARANTEED containers are not adversely impacted by running OPPORUTNISTIC 
containers.

Let us know your thoughts, if the workload you guys are running is okay with 
the change. [~leftnoteasy] Please add anything that I may have missed.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-15 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476384#comment-16476384
 ] 

Haibo Chen commented on YARN-8250:
--

[~leftnoteasy]

I can introduce a plug-gable policy that encapsulates different behaviors. But 
the behavior different is not just to decide if when should launch a container 
X, but also when to start checking and what to do with containers of different 
types.  The new containerScheduler will be much like the 
AbstractContainerScheduler with forwarding calls to the policy, and each policy 
is similar to the DefaultContainerScheduler and OpportunisticContainerScheduler.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-15 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476376#comment-16476376
 ] 

Haibo Chen commented on YARN-8250:
--

Thanks [~asuresh] for the response.
{quote} the whole point of opportunistic containers is to start them as fast as 
possible
{quote}
I agree in general that we should start Opportunistic containers as fast as 
possible, given they are more likely to be preempted anyway. The possible 
container churn, however, can discourage many users of Oversubscription. One 
design goal of oversubscription is to make it as seamlessly as possible that 
users are willing to turn this on without worrying too much about the 
possibility of many container/job failures. If we'd launch OPPORTUNISTIC 
containers aggressively,  many of them can be preempted shortly, now the 
framework AMs need to be conscious about it and handle them differently (From 
an AM's perspective, if an AM is opting in the feature, then the AM needs to be 
prepared to handle much more frequent failures. It is natural to think twice 
whether it want to opt in the feature).  

Opting-in oversubscription is in a lot of ways to AMs like say I'm willing to 
start a task eagerly in an Opportunistic container, but when the time comes 
that I will be getting a Guaranteed container if I had not started early, I'd 
be running the task in a GUARANTEED container from then on (scheduler 
automatically promotes when the time comes).  To provide a smooth experience, 
what's implied is that Opportunistic container failures do not occur too often 
to become a significant downside of opting-in oversubscription. We cannot never 
avoid Opportunistic container failures, but in cases like this, we could be 
less aggressive to minimize the failure.  This specific goal of YARN-1011 also 
makes it less suitable to turn on in clusters where the utilization is already 
very high, IMO. I hope that makes sense.

The pause/resume feature does sound useful to avoid losing work. Does the AM 
need to be aware of this, and what does AM do if a container is being paused 
for a while?

 

I believe a container kill is done with a soft kill followed by a kill -9. In 
case of over-allocating, we don't know exactly how many O containers to kill 
because we only have resource request info for any given container, rather than 
how much resources a running O container actually using.  This is not a concern 
when over-allocation is off. On one hand, we'd aggressive launch O containers, 
on the other hand, we want to avoid preempting O containers as much as possible 
as described previously. What we end up doing is that we'd kill one O container 
at a time, a large G container can be sitting in the queue for more than a few 
seconds if multiple containers need to be killed one by one. Again, it sounds 
like the pause resume feature is useful here if we aggressive preempt O 
containers.

 

 

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476312#comment-16476312
 ] 

Arun Suresh commented on YARN-8250:
---

Also with regard to this:
{quote}
Minimize impact on GUARANTEED containers from over-allocating node with 
OPPORTUNISTIC containers. Queuing time of GUARANTEED containers would increase 
with more running OPPORTUNISTIC containers, which is the case with 
over-allocating.
{quote}
Would like to understand why this is so. When a G container comes in, and 
resources are currently being used by a number of O containers. it first queues 
the G containers and then the ContainerScheduler (CS) will request that the 
appropriate number of O containers are killed (or paused). Once the CS receives 
event that the O containers are killed/paused, it will start the queued G 
containers. If the kill signals are kill -9, then the events should be received 
almost immediately. I don't expect more than a second or 2 for the queued G 
containers to start.
Given that, in the decently utilized cluster, it is possible for the RM to take 
a couple of seconds to return container tokens, do you think the added 
complexity is justified just to shave a second or two in the container startup 
times ?
I agree for extremely short tasks (where life time is of the order of a few 
seconds), maybe it is justified - but in our experience, for many of those 
tasks, localization time dominates runtime - and localization happens before 
the container is even sent to the scheduler.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476292#comment-16476292
 ] 

Arun Suresh commented on YARN-8250:
---

[~haibochen], with regard to this:
{quote}
avoid aggressive OPPORTUNISTIC container launching. One thing to note is that 
in case of over-allocation, we'd rely on the resource utilization metrics to 
decide how much resources we can to launch OPPORTUNISTIC containers. The 
resource utilization metrics in NM is unfortunately only updated every few 
seconds. This can be problematic in that NM could end up with launching too 
many OPPORTUNISTIC containers before the metric is updated. The current default 
container scheduler launches containers aggressively, which could cause 
containers to be launched and killed shortly after.  The new container 
scheduler only schedule OPPORTUNISTIC containers once whenever the utilization 
metric is updated.
{quote}
IMHO, the whole point of opportunistic containers is to start them as fast as 
possible. Even if the resource utilization is updated once every few seconds (I 
am guessing this should be configurable - and therefore can be reduced) It 
shouldnt matter if opportunistic containers are launched aggressively. The 
scheduler cannot launch more than the available resources anyway. I agree there 
can be churn - but think of it this way, these O containers were scheduled on 
the node with the assumption that they have a higher likelyhood of being 
preempted anyway - so why not just start them and let the container get 
pre-empted when resources are low. Furthermore, once pause and resume is turned 
on, there will technically be no lost work either.  

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476088#comment-16476088
 ] 

Wangda Tan commented on YARN-8250:
--

Thanks [~haibochen] for the detailed explanation, 

bq. Not sure what can be done here to unify the two, as they fundamentally have 
issues with the other one's approach. Hence, the proposal to have two 
implementations.
It it possible to make a pluggable policy to check: if it is possible to launch 
a container X, which returns true or false. For existing container scheduler, 
it is always true. And for over allocation case, it talks to the policy and 
decide. Related code could be pulled to the separate policy if possible. 

Maybe I didn't get the full picture, but from what I can see, there's still no 
fundamental issue which blocks us making a same implementation (with pluggable 
policies) for the two scenarios.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475100#comment-16475100
 ] 

Haibo Chen commented on YARN-8250:
--

[~leftnoteasy] Thanks for your comments.

I agree that we should avoid the CS vs FS issue if possible. As I have 
mentioned, the rational is to do things that is suitable for Oversubscription 
but do not break or de-stablize existing functionalities.

Can you elaborate on what do you mean by an issue we need to fix. I was 
describing a behavior that is fine except for over-allocation. Today container 
scheduler tries to launch opportunistic containers whenever there is a 
container scheduling request, or whenever a container finishes. This is not an 
issue today. But, in the case of over-allocation because of the fact that the 
utilization metrics is stale, it is possible that we'd have the following case. 
A few containers finishes, the container monitor checks the node utilization, 
which is low, and then the container scheduler gets the container finish events 
 and aggressively tries to start opportunistic containers. Then later NM 
realizes that opportunistic containers need to be preempted. 

Not sure what can be done here to unify the two, as they fundamentally have 
issues with the other one's approach. Hence, the proposal to have two 
implementations.
{quote}) I'm not sure if we should give all the decisions to CGroups
{quote}
One key thing to note is that we want to ensure GUARANTEED containers are not 
slowed down by OPPORTUNISTIC containers, so cgroup is always a requirement to 
do over-allocation from day one to ensure isolation. Unless  docker container 
executor has similar mechanisms, it is hard to make over-allocation work 
properly with docker without much downsides that render the feature unusable.

 

I am open to suggestions to make things simpler and more maintainable, but as 
noted here, there is fundamental behavior changes. I'll try to take a look at 
if there is more behaviors that we could extract into the base container 
scheduler.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475062#comment-16475062
 ] 

Wangda Tan commented on YARN-8250:
--

[~haibochen], 

I took a very brief look at implemented code since I couldn't find a chance to 
read through the implementation.

My thoughts:
- To me it is important to have a single implementation with different policies 
or just fix it correctly. Otherwise it will enter the CS vs. FS issue short 
after this.
- For 2), it looks like an issue we need to fix: why we want to keep the logic 
to aggressively launch O containers and let them killed by framework shortly 
after launch. 
- For 1) I'm not sure if we should give all the decisions to CGroups. In some 
cases kill container cannot be done immediately by system IIRC (like docker 
container) , it's better to look at existing status of running containers 
before launch a container.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475011#comment-16475011
 ] 

Haibo Chen commented on YARN-8250:
--

My understanding of SHED_QUEUED_CONTAINERS is to notify container scheduler to 
get rid of some opportunistic containers being queued. The intent of the new 
SCHEDULER_CONTAINERS is to let container scheduler try to launch opportunistic 
containers that are currently being queued. A follow-up would be also reusing 
SCHEDULER_CONTAINERS to preempt running opportunistic containers.  I am not 
sure how to best align the two.

The main reasons why we'd like to introduce a new container scheduler are

1) Minimize impact on GUARANTEED containers from over-allocating node with 
OPPORTUNISTIC containers. Queuing time of GUARANTEED containers would increase 
with more running OPPORTUNISTIC containers, which is the case with 
over-allocating. The code as in YARN-6675 gets complicated,  Alternatively, we 
could launch GUARANTEED containers immediately and reply on cgroup-mechanism 
for preemption. 

2) avoid aggressive OPPORTUNISTIC container launching. One thing to note is 
that in case of over-allocation, we'd rely on the resource utilization metrics 
to decide how much resources we can to launch OPPORTUNISTIC containers. The 
resource utilization metrics in NM is unfortunately only updated every few 
seconds. This can be problematic in that NM could end up with launching too 
many OPPORTUNISTIC containers before the metric is updated. The current default 
container scheduler launches containers aggressively, which could cause 
containers to be launched and killed shortly after.  The new container 
scheduler only schedule OPPORTUNISTIC containers once whenever the utilization 
metric is updated.

It is to my understanding that removing GUARANTEED container queuing would 
de-stablize cases like yours where nodes are running with a high utilization, 
and scheduling OPPORTUNISTIC containers only every few seconds would delay 
launch time in distributed scheduling. 

Hence, we created a plug-gable container scheduler so that we can choose to do 
things differently without causing issues to existing use cases. The new 
container scheduler should probably be named or documented so that it is only 
used when over-allocation is enabled.

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474808#comment-16474808
 ] 

Arun Suresh commented on YARN-8250:
---

[~haibochen], I am not entirely convinced we really need to make the 
ContainerScheduler plug-able. Maybe if you you could provide a code snippet of 
how the new ContainerScheduler used for over-allocation needs to be different - 
I can maybe have a better context.

You have created a new {{SCHEDULE_CONTAINERS}} event. Wondering if 
{{SHED_QUEUED_CONTAINERS}} should be re-used here ?


> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474715#comment-16474715
 ] 

Haibo Chen commented on YARN-8250:
--

[~asuresh] Did you get a change to look at the patch?

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469628#comment-16469628
 ] 

Arun Suresh commented on YARN-8250:
---

[~haibochen], [~miklos.szeg...@cloudera.com].. do give me a day or so to take a 
look..


> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469577#comment-16469577
 ] 

Haibo Chen commented on YARN-8250:
--

Not sure how to address the last checkstyle issue, because we'd ignore 
container pause event  that is supported by the new 
OpportunisticContainerScheduler.

I can update the patch to take care of the first issue.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469553#comment-16469553
 ] 

Miklos Szegedi commented on YARN-8250:
--

I think the first and the last checkstyle comments could be addressed.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469548#comment-16469548
 ] 

genericqa commented on YARN-8250:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
34s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
3s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
12s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 6 new + 585 unchanged - 1 fixed = 591 total (was 586) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
16s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m  
4s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8250 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922708/YARN-8250-YARN-1011.02.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux a1df197a9f6a 3.13.0-141-generic #190-Ubuntu SMP

[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469398#comment-16469398
 ] 

Haibo Chen commented on YARN-8250:
--

This is based on UpdateContainerTokenEvent. There are four combinations 
possible.

If isResourceChange() returns true, then it is a container resizing request. It 
is a container resource increase if isIncrease() return true.

If isExecTypeUpdate() returns true, then it is a container promotion/demotion 
request. It's a promotion is isIncrease() return true.
 
isDecrease() is implied to be true if isIncrease() returns false. See 
UpdateContainerTokenEvent.java. Hence, not much we could do here.

 

While I agree with you that the code can be improved to avoid such confusion, 
I'd leave that to a different jira.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469378#comment-16469378
 ] 

Miklos Szegedi commented on YARN-8250:
--

Thank you for the updated patch [~haibochen].
{code:java}
if (updateEvent.isIncrease()){code}
The else of this one should check for isDecrease otherwise a non-change would 
trigger a decrease.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469356#comment-16469356
 ] 

Haibo Chen commented on YARN-8250:
--

The unit test failures are unrelated. The ones in 
TestDefaultContainerSchedulerQueuing are YARN-8244.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469315#comment-16469315
 ] 

genericqa commented on YARN-8250:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
41s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
1s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
30s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
10s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
38s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 17 new + 585 unchanged - 1 fixed = 602 total (was 586) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 15s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 26m 13s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl |
|   | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestDefaultContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8250 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922659/YARN-8250-YARN-1011.01.patch
 

[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469001#comment-16469001
 ] 

Haibo Chen commented on YARN-8250:
--

Thanks for the review, [~miklos.szeg...@cloudera.com]!
{quote}getContainersUtilization and updateContainersUtilization might need to 
be synchronized or sampled (cloned).
{quote}
Most things in container scheduler are not synchronized, for the assumption 
that almost everything handled by the single event dispatcher thread, unless it 
is accessed by multiple threads. getContainersMonitor() is also just executed 
by the dispatcher thread, so I'd tend to leave it as is.

shedQueuedOpportunisticContainers actually does LIFO. It does so by walking 
from the beginning of the queue until the allowed number of containers, then 
killing the rest of the queued containers till the end of the queue.

I'll update the patch with the rest of your comments.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468120#comment-16468120
 ] 

genericqa commented on YARN-8250:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
36s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 58 new + 527 unchanged - 1 fixed = 585 total (was 528) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
4s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 45s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m  4s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}108m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerSchedulerEvent
 to 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.UpdateContainerSchedulerEvent
 in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.AbstractContainerScheduler.handle(ContainerSchedulerEvent)
  At 
AbstractContainerScheduler.java:org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.UpdateContainerSchedulerEvent
 in 

[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-08 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468020#comment-16468020
 ] 

Miklos Szegedi commented on YARN-8250:
--

Thanks for the patch, [~haibochen].

ContainerScheduler could be renamed DefaultContainerScheduler to give space to 
extensions later.

You could use conf.getClass in createContainerScheduler to automatically verify 
the parent class.

getContainersUtilization and updateContainersUtilization might need to be 
synchronized or sampled (cloned).
{code:java}
141 public ContainersMonitor getContainersMonitor() {
142 return nmContext.getContainerManager().getContainersMonitor();
143 }{code}
Usually it is considered a better practice to return nmContext and rely on the 
caller to retrieve the rest.

shedQueuedOpportunisticContainers does a fifo, it might make sense to do a lifo.

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org