[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

2016-03-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212248#comment-15212248
 ] 

Bikas Saha commented on YARN-1040:
--

bq. Hmmm.. Given that launching multiple processes, being a new feature, I feel 
that it should be fine to mandate the app to use new APIs, no ?
In Tez/Spark clearly using the ability to launch multiple processes in 
containers will need the use of new APIs on the NM. And that could be an 
optional feature local to that part of the code that can be safely added and 
then turned on/off in an isolated manner by users. That is fine. But if to use 
the new API's for this one optional feature, we have to change Tez/Spark to 
redo their AM-RM implementations and update all their internals regarding the 
concept of allocations and containers (where what the entire code used to 
consider containers are now allocations), then I hope we appreciate how 
destabilizing that change would be to those projects.

> De-link container life cycle from an Allocation
> ---
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

2016-03-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208906#comment-15208906
 ] 

Bikas Saha commented on YARN-1040:
--

It would be great if existing apps can use the changes in YARN-1040 to be able 
to run more than a single process (sequentially or concurrently). If we use 
YARN-1040 to build the primitives here then those primitives could be used for 
the broader work designed for services (which seems to be indicated in the 
design doc). Without YARN-1040, existing java based apps cannot use features 
like increasing container memory because the JVM has to be restarted before it 
can grow to a larger size. I can see the argument of asking users to use new 
APIs for new features but requiring existing apps to change their AM/RM 
implementations (that have been stabilized with much effort) just to be able to 
launch multiple processes does not seem empathetic.

Separately from this, I have not been actively involved in the project for a 
while. Hence my understanding of the scope and semantic changes proposed in it 
may be stale and I may be inaccurate in thinking that these are fundamental 
enough to be done in a special jira for that purpose for a wider discussion. 
You guys can make a call on that.

> De-link container life cycle from an Allocation
> ---
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

2016-03-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202391#comment-15202391
 ] 

Bikas Saha commented on YARN-1040:
--

This design doc effectively looks like a re-design of almost all core semantics 
of YARN. This probably deserves a wider discussion on the dev email list and 
under its own jira. Although it covers YARN-1040 and YARN-4726 the scope looks 
much wider and careful thinking about backwards compatibility is needed etc. 
Conceptually this changes the current semantic understanding of allocation and 
container thats widely understood externally. I am afraid that this jira or 
just the folks on this thread are not enough to make a decision for the given 
proposal.

As far as this jira is concerned, both the previous (say a) & new (say b) 
proposals sound similar with startContainer_in_a renamed to 
startAllocation_in_b & startProcess_in_a renamed to startContainer_in_b. So we 
may be fine in that restricted part minus the renamings.

> De-link container life cycle from an Allocation
> ---
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4758) Enable discovery of AMs by containers

2016-03-03 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-4758:
-
Description: 
{color:red}
This is already discussed on the umbrella JIRA YARN-1489.

Copying some of my condensed summary from the design doc (section 3.2.10.3) of 
YARN-4692.
{color}

Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
YARN-1489), we still haven’t solved the problem of old running containers not 
knowing where the new AM starts running after the previous AM crashes. This is 
a specifically important problem to be solved for long running services where 
we’d like to avoid killing service containers when AMs fail­over. So far, we 
left this as a task for the apps, but solving it in YARN is much desirable. 
[(Task) This looks very much like service­-registry (YARN-913), but for 
app­containers to discover their own AMs.

Combining this requirement (of any container being able to find their AM across 
fail­overs) with those of services (to be able to find through DNS where a 
service container is running - YARN-4757) will put our registry scalability 
needs to be much higher than that of just service end­points. This calls for a 
more distributed solution for registry readers  something that is discussed in 
the comments section of YARN-1489 and MAPREDUCE-6608.
See comment 
https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359

  was:
{color:red}
This is already discussed on the umbrella JIRA YARN-1489.

Copying some of my condensed summary from the design doc (section 3.2.10.3) of 
YARN-4692.
{color}

Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
YARN-1489), we still haven’t solved the problem of old running containers not 
knowing where the new AM starts running after the previous AM crashes. This is 
a specifically important problem to be solved for long running services where 
we’d like to avoid killing service containers when AMs fail­over. So far, we 
left this as a task for the apps, but solving it in YARN is much desirable. 
[(Task) This looks very much like service­-registry (YARN-913), but for 
app­containers to discover their own AMs.

Combining this requirement (of any container being able to find their AM across 
fail­overs) with those of services (to be able to find through DNS where a 
service container is running - YARN-4757) will put our registry scalability 
needs to be much higher than that of just service end­points. This calls for a 
more distributed solution for registry readers  something that is discussed in 
the comments section of YARN-1489 and MAPREDUCE-6608.


> Enable discovery of AMs by containers
> -
>
> Key: YARN-4758
> URL: https://issues.apache.org/jira/browse/YARN-4758
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>
> {color:red}
> This is already discussed on the umbrella JIRA YARN-1489.
> Copying some of my condensed summary from the design doc (section 3.2.10.3) 
> of YARN-4692.
> {color}
> Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
> YARN-1489), we still haven’t solved the problem of old running containers not 
> knowing where the new AM starts running after the previous AM crashes. This 
> is a specifically important problem to be solved for long running services 
> where we’d like to avoid killing service containers when AMs fail­over. So 
> far, we left this as a task for the apps, but solving it in YARN is much 
> desirable. [(Task) This looks very much like service­-registry (YARN-913), 
> but for app­containers to discover their own AMs.
> Combining this requirement (of any container being able to find their AM 
> across fail­overs) with those of services (to be able to find through DNS 
> where a service container is running - YARN-4757) will put our registry 
> scalability needs to be much higher than that of just service end­points. 
> This calls for a more distributed solution for registry readers  something 
> that is discussed in the comments section of YARN-1489 and MAPREDUCE-6608.
> See comment 
> https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-02-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171244#comment-15171244
 ] 

Bikas Saha commented on YARN-1011:
--

bq. If it absolutely wants a guaranteed container, we should allocate a 
guaranteed container and kill the opportunistic one. If it does not want, we 
can let the opportunistic container continue to run.
I get that. The question is which of the 2 is the default behavior?
Next question to consider: If we let the opportunistic container run and 
consider it as part of guaranteed capacity then what prevents the node from 
killing it when the node resources get actually over-used and something needs 
to be killed by the node. And how does the rest of the system (yarn + app) 
react to losing a guaranteed container?

bq. The overall cluster utilization is an implementation detail, its sole 
purpose is to reduce the chances of running into cases that need cross-node 
promotion.
I am sorry I could not understand how that is so?


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, 
> yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167889#comment-15167889
 ] 

Bikas Saha commented on YARN-1040:
--

Vinod, the plan you are suggesting has merits. But my initial impression is 
that reworking allocations and containers is a much bigger change than whats 
proposed earlier in this jira. Not only internally in YARN but also externally 
in terms of thinking about the whole larger flow of allocations and containers 
for users of YARN.
The proposal discussed earlier is of much smaller scope and I believe 
sufficient to take us where we need to go. And it does not need reworking the 
RM related flow of allocations and containers. E.g. it may not be necessary for 
the RM to understand single use allocations vs multi-use vs concurrent use 
allocations. But for the RM level changes you are suggesting we may be on the 
path of convergence.

At this point, the discussion is complex enough that we may want to gather 
interested people and do it as a group outside jira comments and then post it 
back.

> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167738#comment-15167738
 ] 

Bikas Saha commented on YARN-1040:
--

My guess is that YARN-4725 may be redundant after we do this work because then 
we would have exposed primitives to apps to make that happen. The arguments for 
YARN not doing it by itself would be the same. If it can be done easily by the 
app and is very likely app dependent without one-size-fits-all then let the app 
do it.

Coming back to this jira. Yes, lets please track any first-class support of the 
notion of upgrades separately which can be done as a follow up.

Perhaps we can put the design in a document and look at the next level of 
details. We can send email to the dev list after adding a more detailed 
document to this jira. Then, based on +ve feedback, we could go ahead with 
jiras/code. The devil is in the details :) This would be a significant change 
and we could use more eyes for reviews.

For startProcess identifier, it may be useful for the app to provide the 
identifier in startProcess and then use it later to refer to the process. E.g. 
stopProcess. vs YARN trying to come up with identifiers. This may make the apps 
life easier because it could use meaningful terms based on its own logic. We 
can discuss such details in the design document.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166633#comment-15166633
 ] 

Bikas Saha commented on YARN-1040:
--

I am sorry if I caused a digression by mentioning Slider etc.

I am not sure the upgrade scenario is the only one for this jira since this 
jira covers a broader set. Even without upgrades apps can change the processes 
they are running in a container without having to lose the container 
allocation. Identical calls of primitives could be used without the notion of 
upgrade. E.g. start a Java process first for a Java task, then launch a python 
process for a Python task. To the NM this is identical to starting v1 and then 
starting v2. So while it makes sense for the second one to use an API called 
upgrade, it may not for the first one. 

(Unrelated to this jira, IMO, YARN should allow upgrade of app code without 
losing containers but not necessarily understand it deeply. E.g. YARN need not 
assume that upgrade will need additional resource or try to acquire them 
transparently for the application.)

For the purpose of this jira here is what my thoughts are when I had opened 
YARN-1292 to delink process lifecycle from container.
1) new API - acquireContainer - means ask for the allocated resource. The API 
has a flag to specify whether process exit implies releaseContainer. This is 
for backwards compatibility with a default of true. Apps that want to continue 
to use that behavior can explicitly pass true when using the new API and is 
mainly for reducing number of RPCs for apps like MR/Tez etc.
2) new API - startProcess - means start the remote process
3) new API - stopProcess - means stop the remote process
4) new API - releaseContainer - means release the allocated resource
5) Potentially a new API for localization, though in theory, this could be 
separate.

Since this fine grained control makes the protocol chatty, we can reduce the 
RPC traffic by having a new NM RPC, say NMCommand, that takes a sequence of API 
primitives that can be sent in 1 RPC.
So the current API of startContainer effectively becomes NMCommand(1, 2) and 
stopContainer becomes NMCommand(3,4). This can be leveraged for backwards 
compatibility and rolling upgrades.

The above items would effectively delink process and container lifecyle and 
close out this jira.

This provides the fine grained control in core YARN that can be used for 
various scenarios e.g. upgrades without YARN understanding the scenarios. If we 
need to add higher level notions for upgrades etc. then those could be done as 
separate items.

I hope that helps make my thoughts concrete within the scope of this jira.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165668#comment-15165668
 ] 

Bikas Saha commented on YARN-1040:
--

Agree with your scenarios. 

I am trying to figure a way by which this does not become a YARN problem (both 
initial work and ongoing maintenance). E.g. we dont know for sure that the 
resource needs to be x, 2x or 3x. This is an allocation decision and cannot be 
done without the RMs blessing. And increasing container resources is already 
work in progress and may become another NM primitive. Next, what is the 
ordering for the tasks during an upgrade? We could implement one of many 
possibilities but then be stuck with bug-fixing or improving it. Potentially 
use that as a precedent to implement yet another upgrade policy. 

Hence, my suggestion of creating composable primitives that can be used to 
easily implement these flows. And leave it to the apps to determine the exact 
upgrades paths. Perhaps Slider is a better place which could wrap different 
upgrade possibilities using the composable primitives. E.g. 
SliderStopAllUpgradePolicy or SliderConcurrentUpgradePolicy. Or they could be 
provided as helper libs in YARN/NMClient so apps dont have to compose the 
primitives from scratch. The main aim is to continue to make core YARN/NM 
simple by creating primitives and layering complexity on top. This approach may 
be simpler and incremental to develop, test and deploy. Of course, these are my 
personal design views :)

Thoughts?


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163708#comment-15163708
 ] 

Bikas Saha commented on YARN-1040:
--

I am not sure we need to place (somewhat artificial) constraints on the app 
when its not clear that it practically affects YARN

1) Container with no process should be allowed. Apps could terminate all 
running tasks of version A, then start running tasks of version B when they are 
not backwards compatible.
2) Container should be allowed to run multiple processes. This is similar to 
the existing process spawning more processes. It is different from that in the 
sense that the NM has to add the new process to existing monitoring/cgroups etc.
3) Startprocess should be allowed with no process actually started. This will 
allow apps to localize new resources to an existing container. Alternatively, 
we could create a new localization API thats delinked from starting the 
process. But re-localization is an important related feature that we should 
look at supporting via this work because currently that does not work since its 
tied to start process.
4) Most current apps are already communicating directly with their tasks and 
hence can shut them down when they are not needed. However, like suggested 
above, it may be useful for the NM to provide a feature whereby the previous 
task can be shutdown when a new task request is received. Alternatively, the NM 
could provide a stopProcess API to make that explicit.

IMO all of this should be allowed. The timeline could be different with some 
being allowed earlier and some later based on implementation effort.

Thinking ahead, it may be useful for the NM to accept a series of API calls 
within the same RPC (with the current mechanism supported as a single command 
entity for backwards compatibility). Then we will not have to build a lot of 
logic into the NM. The app can get all features by composing a multi-command 
entity.
E.g.
Current start process = {acquire, localize, start} // where acquire means start 
container
Current shutdown process = {stop, release} // where release means give up 
container
Only localize = {localize}
Start another process = {localize, start}
Start another process after shutting down first process = {stop, start} or 
{stop, localize, start}
Start another process and then shutdown the first process = {start, stop}
New container shutdown = {release} // at this point there may be 0 or more 
processes running and which will be stopped


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-02-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157482#comment-15157482
 ] 

Bikas Saha commented on YARN-1011:
--

2.3 The relationship between overall cluster utilization and node 
over-allocation is not clear. Like you say, for an under allocated cluster, it 
would likely be easy to find a guaranteed container. So I am not sure if we 
should go ahead and make this tenuous link formal by adding it as a config in 
the code. Regardless of cluster allocation state, a node could be 
over-allocated. 

3 So we are going to add promotion notification to the AM RM protocol, right? 
By corollary, we would be adding a flag to the initial allocation that shows if 
it was guaranteed or opportunistic, right?

3.2 I agree that cross node promotion is complex. But I am afraid, it does not 
look like something that can be deferred for later. Because its likely not 
possible. Its very likely that an app may have a guaranteed and an 
opportunistic container. And when it gives up a guaranteed container then we 
will need to allocate another one to it. That may be the opportunistic 
container or a new one. So its ok to defer any advanced stuff at this time, but 
in the minimum, for the sake of a complete logic definition, we will need some 
default behavior. The obvious default behavior would be to ignore the 
opportunistic container and let it run until it finishes or it preempted 
(because the node becomes busy). This aligns with the philosophy of 
opportunistic allocation being a secondary scheduling loop. Perhaps this is 
what we had in mind and if so, then my request is to call it for the sake of 
completeness.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, 
> yarn-1011-design-v2.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-02-20 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155867#comment-15155867
 ] 

Bikas Saha commented on YARN-1011:
--

2.3. Whats the reasoning behind this? Over-allocating a node seems to be a 
local decision based on the nodes expected and actual utilization. So I would 
expect the logic to be something similar to 1) Node is already 100% allocated 
2) Actual utilization is < 80% 3) Over-allocate to bring actual utilization 
~=80%.

3. What is the AM/RM interaction in this promotion?
3.2. Not clear what is actually happening here? Will new container be allocated 
and the opportunistic container allowed to continue till is exits or is 
preempted?

How does all this interact with preemption?

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, 
> yarn-1011-design-v2.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2016-01-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118116#comment-15118116
 ] 

Bikas Saha commented on YARN-2019:
--

Does this now mean that during a failover the new RM could forget about the 
jobs that failed to get stored by the previous RM?

> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086326#comment-15086326
 ] 

Bikas Saha commented on YARN-1011:
--

Some of what I am saying emanates from prior experience with a different Hadoop 
like system. You can read more about it here. 
http://research.microsoft.com/pubs/232978/osdi14-paper-boutin.pdf

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086325#comment-15086325
 ] 

Bikas Saha commented on YARN-1011:
--

bq. At this point what happens to the opportunistic container. It is clearly 
running at lower priority on the node and as such we are not giving the job its 
guaranteed capacity.
bq. At this point what happens to the opportunistic container. It is clearly 
running at lower priority on the node and as such we are not giving the job its 
guaranteed capacity.
Yes. the difference is that the opportunistic container may not be convertible 
into a normal container because that node is still over-allocated. So at this 
point, what should be done? Should this container be terminated and run 
somewhere else as normal (because capacity is now available)? Should some other 
container be preempted on this node to make this container normal? Should the 
RM allocate a normal container and give it to the app in addition to the 
running opportunistic container in case the app can do the transfer internally?

Also, with this feature in place, should we run all containers beyond 
guaranteed capacity as opportunistic containers? This would ensure that any 
excess containers that we give to a job will not affect performance of the 
guaranteed containers of other jobs. This would also make the scheduling and 
allocation more consistent in that the guaranteed containers always run at 
normal priority and extra containers run at lower priority. The extra container 
could be extra over capacity (but without over-subscription) or extra 
over-subscription. Because of this I feel that running tasks at lower priority 
could be an independent (but related) work item.

Staying on this topic and addition configuration to it. It may make sense to 
add some way by which an application can say that dont oversubscribe nodes when 
my containers run on it. Putting cgroups or docker in this context, would these 
mechanism support over-allocating resources like cpu or memory?

bq. When space frees up on nodes, NMs send candidate containers for promotion 
on the heartbeat.
That shouldn't be necessary since the RM will get to know about free capacity 
and run its scheduling cycle for that node - at which point it will be able to 
take action like allocation a new container or upgrading an existing one. There 
isnt anything the NM can tell the RM (which the RM already does not know) 
except for the current utilization of the node.

Some of what I am saying emanates from prior experience with a different Hadoop 
like system. You can read more about it here. 
http://research.microsoft.com/pubs/232978/osdi14-paper-boutin.pdf


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083957#comment-15083957
 ] 

Bikas Saha commented on YARN-1011:
--

I agree with natural container churn in favor of preemption to avoid lost work 
though the issue of clearly defining scheduler policy still remains.

bq.  If we were oversubscribing 10X then I'd probably want it for sure, but if 
it's at most 2X capacity then worst case is a container only gets 50% of the 
resource it had requested. Obviously for something like memory this has to be 
closely controlled because going over the physical capabilities of the machine 
has very significant consequences. But for CPU, I'd definitely be inclined to 
live with the occasional 50% worst case for all containers, in order to avoid 
the 1/1024th worst case for OPPORTUNISTIC containers on a busy node.
I did not understand this. Does this mean, its ok for normal containers to run 
50% slower in the presence of opportunistic containers? If yes, then there are 
scenarios where this may not be a valid choice. E.g. when a cluster is running 
a mix of SLA and non-SLA jobs. Non-SLA jobs are ok if there containers got 
slowed down to increase cluster utilization by running opportunistic containers 
because we are getting higher overall throughput. But SLA jobs are not ok with 
missing deadlines because there tasks ran 50% slower. 

IMO, the litmus test for a feature like this would be to take an existing 
cluster (with low utilization because tasks are asking for more resources than 
what they need 100% of the time). Then turn this feature on and get better 
cluster utilization and throughput without affecting the existing workload. 
Whatever be the internal implementation details. Agree?

bq. 50% of maximum-under-utilized resource of past 30 min for each NM can be 
used to allocate opportunistic containers.
These are heuristics and may all be valid under different circumstances. What 
we should step back and see is what is the source of this optimization.
Observation : Cluster is under-utilized despite being fully allocated
Possible reasons : 
1) Tasks are incorrectly over-allocated. Will never use the resources they ask 
for and hence we can safely run additional opportunistic containers. So this 
feature is used to compensate for poorly configured applications. Probably a 
valid scenario but is it common?
2) Tasks are correctly allocated but dont use their capacity to the limit all 
the time. E.g. Terasort will use high cpu only during the sorting but not 
during the entire length of the job. But its containers will ask for enough CPU 
to run the sort in the desired time. This is a typical application behavior 
where resource usage varies over time. So this feature is used to soak up the 
fallow resources in the cluster while tasks are not using their quoted capacity.

The arguments and assumptions we make need to be considered in the light of 
which of 1 or 2 is the common case and where this feature will be useful.

While its useful to have configuration knobs, for a complex dynamic feature 
like this that is basically reacting to runtime observations, it may be quite 
hard to be able to configure this statically using manual configuration. While 
some limits about max over-allocation limit etc. are easy and probably required 
to configure, we should look at making this feature work by itself instead of 
relying exclusively on configuration (hell :P) for users to make this feature 
usable.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083683#comment-15083683
 ] 

Bikas Saha commented on YARN-1011:
--

Good points but let me play the devils advocate to get some more clarity :)
bq. As soon as we realize the perf is slower because the node has higher usage 
than we had anticipated, we preempt the container and retry allocation 
(guaranteed or opportunistic depending on the new cluster state). So, it 
shouldn't run slower for longer than our monitoring interval. Is this 
assumption okay?
How do we determine that the perf is slower? The CPU would never exceed 100% 
even under over-allocation. Is preempting always necessary? If we are sure that 
the OS is going to starve the opportunistic containers, then can assume that 
when the node is fully utilized, then only our guaranteed containers are using 
resources? So we can let the opportunistic containers be so that they can start 
soaking up excess capacity after the normal containers have stopped spiking. 
Perhaps some experiments will shed some light on this.

bq. The opportunistic container will continue to run on this node so long as it 
is getting the resources it needs. If there is any sort of resource contention, 
it is preempted and is up for allocation on one of the free nodes.
Lets say job capacity is 1 container and the job asks for 2. Its get 1 normal 
container and 1 opportunistic container. Now it releases its 1 normal 
container. At this point what happens to the opportunistic container. It is 
clearly running at lower priority on the node and as such we are not giving the 
job its guaranteed capacity. The question is not about finding an optimal 
solution for this problem (and there may not be one). The issue here is to 
crisply define the semantics around scheduling in the design. Whatever the 
semantics are, we should clearly know what they are. IMO, the exact semantics 
of scheduling should be in the docs.

bq. The RM schedules the next highest priority "task" for which it couldn't 
find a guaranteed container as an opportunistic container. This task continues 
to run as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If guaranteed resources free up on the 
node it is running, isn't it fair to promote the container to Guaranteed.
Sure. And thats why the system should upgrade opportunistic containers in the 
order in which they were allocated. However, the decision must be made at the 
RM and not the NM since the NMs dont know about total capacity and multiple NMs 
locally upgrading their opportunistic containers might end up over-allocating 
for a job. Further, the queue sharing state may have changed since the 
opportunistic allocation, and hence assuming that the opportunistic container 
"would have" gotten that allocation anyways, at a later point in time, may not 
be valid.

In summary, what we need in the document is a clear definition of the 
scheduling policy around this - whatever that policy may be.


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081950#comment-15081950
 ] 

Bikas Saha commented on YARN-1011:
--

In Tez we always try to allocated the most important work to the next allocated 
container. So doing opportunistic containers without providing the AM with the 
ability to know about it and use it judiciously may not be something that can 
be delayed to a second phase.

Being able to choose only guaranteed or non-guaranteed containers only covers 
half the problem (and probably the less relevant one) in which an application 
should always finish in 1min using guaranteed capacity but may sometimes finish 
in 30s because it got opportunistic containers. The other side is probably more 
important where a regression is caused due to opportunistic containers. 
1) the app got opportunistic containers and their perf wasnt the same as normal 
containers - so it ran slower. This may be mitigated by the system guaranteeing 
that only excess container beyond guaranteed capacity would be opportunistic. 
This would require that the system upgrade opportunistic containers in the same 
order as it would allocate containers. However, things get complicated because 
a node with an opportunistic container may continue to run its normal 
containers while space frees up for guaranteed capacity on other nodes. At this 
point, which container becomes guaranteed - the new one on a free node or the 
opportunistic one that is already doing work? Which one should be preempted?
2) the app suffered because its guaranteed containers got slowed down due to 
competition from opportunistic containers. This needs strong support for lower 
priority resource consumption for opportunistic containers.

IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that. The NM does not know if this would exceed 
guaranteed capacity and in total, a bunch of NMs making this choice locally can 
lead to excessive over-allocation of guaranteed resources.



> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072279#comment-15072279
 ] 

Bikas Saha commented on YARN-1011:
--

In my prior experience, something like this is not practical without pro-active 
cpu management (which has been delegated to future work in the document). It is 
essential to run opportunistic tasks at lower OS cpu priority so that they 
never obstruct progress of normal tasks. Typically we will find that the 
machine is under-allocated the most in cpu usage since most processing has 
bursty cpu. When a normal task has a cpu burst then it should have to contend 
with an opportunistic task since this will be detrimental to the expected 
performance of that task. Without this, jobs will not run predictably in the 
cluster. From what I have seen, users prefer predictability over most other 
things. ie. having a 1 min job run in 1 min all the time vs making that job run 
in 30s 85% of the time and but in 2 mins for 5% of the time because that makes 
it really hard to establish SLAs. In fact, this is the litmus test for 
opportunistic scheduling. It should be able to raise the utilization of a 
cluster from (say 50%) to (say 75%) without affecting the latency of the jobs 
compared to when the cluster was running at 50%.

For memory, in fact, its ok to share and reach 100% capacity but its important 
to check that the machine does not start thrashing. Most well written tasks 
will run within their memory limits and start spilling etc. Opportunistic tasks 
are trying to occupy the memory that these tasks thought they could use but are 
not using (or that these tasks are keeping in buffer on purpose). The crucial 
thing to consider here is to look for stats that signify the onset of memory 
paging activity (or overall memory over-subscription at the OS level). At that 
point, even normal tasks that are within their limit will be adversely affected 
because the OS will start paging memory to disk. So we need to start 
proactively killing opportunistic tasks before the such paging activity gets 
triggered.

Handling opportunistic tasks raises questions on the involvement of the AMs. 
Unless I missed something, this is not called out clearly in the doc. In that 
sense it would be instructive to consider opportunistic scheduling in a similar 
light as preemption. App got container that it should not have gotten at that 
time if we had been strict but got it because we decided to loosen the strings 
(of queue capacity or machine capacity resp).
- will opportunistic containers be given only when for containers that are 
beyond queue capacity such that we dont break any guarantees on their 
liveliness. ie. an AM will not expect to lose any container that is within its 
queue capacity but opportunistic containers can be killed at any time.
- does the AM need to know that a newly allocated container was opportunistic. 
E.g. so that it does not schedule the highest priority work on that container. 
- will conversion of opportunistic containers to regular containers be 
automatically done by the RM? Will the RM notify the AM about such conversions?
- when terminating opportunistic containers will the RM ask the AM about which 
containers to kill? Given the above perf related scenarios this may not be a 
viable option.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060559#comment-15060559
 ] 

Bikas Saha commented on YARN-1197:
--

The API supports it but the backed implementation does not. So in the future, 
based on need, this could be supported compatibly. Do you have a scenario where 
this is essential?

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-11-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028104#comment-15028104
 ] 

Bikas Saha commented on YARN-4108:
--

These problems will be hard to solve without involving the scheduler in the 
decision cycle. The preemption policy can determine how much to preempt from a 
queue at a macro level. But the actual containers to preempt would be selected 
by the scheduler.
That is where using the global node picture will help. For a given container 
request, if we can scan its nodes (if any) and make  either an allocation or 
preemption decision. 
Else, if we are doing container allocation on node heartbeat, then just like 
delay scheduling logic, we can mark a node for preemption but not preempt it 
and associate that node with the container request for which preemption is 
needed (request.nodeToPreempt). And we can cycle through all nodes like this 
and change the request->node association when we find better nodes to preempt. 
After cycling through all nodes, if when we again reach a node such that it 
matches the request.nodeToPreempt then we can execute the decision of actually 
preempting the node. If there are no nodes that can satisfy the request (e.g. 
request wants node A but preemptedQueue has no containers on node A) then 
scheduler should be able to callback to the preemption module and notify it so 
that some other queue can be picked to preempt.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2015-11-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025542#comment-15025542
 ] 

Bikas Saha commented on YARN-4390:
--

I am not sure if this is a bug as described. If preemption does free 8x1GB 
containers then it will create 8GB free space on the node. The scheduler (which 
is container request size aware) should then allocate 1x8GB container to the 
under-allocated AM. [~curino] Is that correct? Of course there could be a bug 
in the implementation but by design, this should not happening.

However, if YARN ends up preempting 8x1GB containers on different nodes then 
the under-allocated AM will not get its resources and may result in further 
avoidable preemptions. This is [~sunilg] case.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart

2015-11-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997181#comment-14997181
 ] 

Bikas Saha commented on YARN-2047:
--

I think the general idea is that the AM cannot be trusted about allocated 
resources or running containers.

> RM should honor NM heartbeat expiry after RM restart
> 
>
> Key: YARN-2047
> URL: https://issues.apache.org/jira/browse/YARN-2047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>
> After the RM restarts, it forgets about existing NM's (and their potentially 
> decommissioned status too). After restart, the RM cannot maintain the 
> contract to the AM's that a lost NM's containers will be marked finished 
> within the expiry time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart

2015-11-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991069#comment-14991069
 ] 

Bikas Saha commented on YARN-2047:
--

>From the description it seems like the original scope was making sure that a 
>lost NM's containers are marked expired by the RM even across RM restart. For 
>that, wont it be enough to save a dead/decommissioned NM info in the state 
>store. Upon restart, repopulate the decommissioned/dead status from the state 
>store. It can take appropriate action at that time - e.g. cancelling an AM 
>containers for those NMs when the AM re-registers or asking those NMs to 
>restart and re-register if they heartbeat again.


If this is a required action then it would also imply that saving a such nodes 
would be a critical state change operation. So, e.g. decommission command from 
the admin should not complete until the store has been updated. Is that the 
case?

> RM should honor NM heartbeat expiry after RM restart
> 
>
> Key: YARN-2047
> URL: https://issues.apache.org/jira/browse/YARN-2047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>
> After the RM restarts, it forgets about existing NM's (and their potentially 
> decommissioned status too). After restart, the RM cannot maintain the 
> contract to the AM's that a lost NM's containers will be marked finished 
> within the expiry time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty

2015-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983845#comment-14983845
 ] 

Bikas Saha commented on YARN-3911:
--

Sure

> Add tail of stderr to diagnostics if container fails to launch or it 
> container logs are empty
> -
>
> Key: YARN-3911
> URL: https://issues.apache.org/jira/browse/YARN-3911
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> The stderr may have useful info in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4278) On AM registration, response should include cluster Nodes report on demanded by registration request.

2015-10-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970840#comment-14970840
 ] 

Bikas Saha commented on YARN-4278:
--

Client can ask for it but IMO, no clients actually do. Hence we probably don't 
have enough evidence to suggest that it might work in practice, specially at 
large deployments like Yahoo.

> On AM registration, response should  include cluster Nodes report on demanded 
> by registration request.
> --
>
> Key: YARN-4278
> URL: https://issues.apache.org/jira/browse/YARN-4278
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> From the yarn-dev mailing list discussion thread 
> [Thread-1|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c0ee80f6f7a98a64ebd18f2be839c91156798a...@szxeml512-mbs.china.huawei.com%3E]
>  
> [Thread-2|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c4f7812fc-ab5d-465d-ac89-824735698...@hortonworks.com%3E]
>  
> Slider required to know about cluster nodes details for providing support for 
> affinity/anti-affinity on containers.
> Current behavior : During life span of application , updatedNodes are sent in 
> allocate request only if there are any change like added/removed/'state 
> change' in the nodes. Otherwise cluster nodes not updated to AM.
> One of the approach thought by [~ste...@apache.org] is while AM registration 
> let response hold the cluster nodes report



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4278) On AM registration, response should include cluster Nodes report on demanded by registration request.

2015-10-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970408#comment-14970408
 ] 

Bikas Saha commented on YARN-4278:
--

Not sure if the alternate to allow an AM to use the ClientRMProtocol was 
considered in the discussions. Node reports as well as other data like queue 
info (which AMs might be also interested in) would then become available to the 
AM.

As far as the approach in this jira is concerned, it would be good to check if 
the AM-RM resync (after an RM restart) would automatically benefit from this 
change or whether it would need to be handled separately.

Any concerns about the size of this data for large clusters? Would it fit 
easily on the RPC? Would it cause excessive network usage on the RM nic when 
many AM's register/resync with it?

> On AM registration, response should  include cluster Nodes report on demanded 
> by registration request.
> --
>
> Key: YARN-4278
> URL: https://issues.apache.org/jira/browse/YARN-4278
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> From the yarn-dev mailing list discussion thread 
> [Thread-1|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c0ee80f6f7a98a64ebd18f2be839c91156798a...@szxeml512-mbs.china.huawei.com%3E]
>  
> [Thread-2|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c4f7812fc-ab5d-465d-ac89-824735698...@hortonworks.com%3E]
>  
> Slider required to know about cluster nodes details for providing support for 
> affinity/anti-affinity on containers.
> Current behavior : During life span of application , updatedNodes are sent in 
> allocate request only if there are any change like added/removed/'state 
> change' in the nodes. Otherwise cluster nodes not updated to AM.
> One of the approach thought by [~ste...@apache.org] is while AM registration 
> let response hold the cluster nodes report



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961154#comment-14961154
 ] 

Bikas Saha commented on YARN-1509:
--

Sounds good

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961035#comment-14961035
 ] 

Bikas Saha commented on YARN-1509:
--

Does issuing an increase followed by the decrease actually remove the pending 
change request on the RM or will it cause the RM to try to change a container 
resource to the same size as the existing resource and then go down the code 
path of increase (new token) or decrease (update NM). This would be a corner 
case that would be good to double check. If that works and the RM actually 
removes the pending container change request then we could use this mechanism 
to implement a cancel method wrapper in the AMRMClient. Otherwise, if fixes are 
needed on the RM side, we could so it separately when we fix the RM.


> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959877#comment-14959877
 ] 

Bikas Saha commented on YARN-1509:
--

bq. Increase/Decrease/Change
Not sure why the implementation went down that route of creating a separation 
of increase vs decrease throughout the flow. In any case, that is the back-end 
implementation. That can change and evolve without affecting user code. This is 
the user facing API. So once code is written against this API making backwards 
incompatible changes or providing new functionality in the future needs to be 
considered. Also just API simplicity needs to be considered.
1) Having a changeRequest API allows us to support other kinds of changes (mix 
of increase or decrease) at a later point. For now, the API could check for 
increase/decrease (which it has to do anyways for sanity checking) and reject 
unsupported scenarios.
2) Even for just increase or decrease, given the user can provide the old 
Container + new Request Size, it should be easy for the library to figure out 
if an increase or decrease is needed. Why burden the user by having them call 2 
different API's that are essentially making the same request.
Having to handle container token for increase but do nothing for decrease is 
something that already has to be explained to the user. Same for the fact that 
decrease is quick but not increase. So those aspects of user education are 
probably orthogonal.

We have iterated on this API question a few times now. In the above 2 points I 
have tried to summarize the reasons for requesting to change instead of 
increase/decrease. Even if we dont support both increase and decrease (point 1) 
I think the simplicity of having a single API (point 2) would be an advantage 
of having change vs increase/decrease. This also simplifies to having a single 
callback onContainerResourceChanged() instead of 2 callbacks.

At this point, I will request you and [~leftnoteasy] to consider both future 
extensions and API simplicity to make a final call on having 2 APIs and 2 
callbacks vs 1.

bq. My thought is, since we already ask the user to provide the old container 
when he sends out the change request, he should have the old container already, 
so we don't necessarily have to provide the old container info in the callback 
method
I am fine either ways. I discussed the same thing offline with [~leftnoteasy] 
and he thought that having the old container info would make life easier for 
the user. A user who is using this API is very likely going to have state about 
the container for which a change has been requested and can match them up using 
the containerId.

bq. The AbstractCallbackHandler.onError will be called when the change 
container request throws exception on the RM side.
The change request is sent inside the allocate heartbeat request, right? So i 
am not sure how we get an exception back for the specific case of a failed 
container change request. Or are you saying that invalid container resource 
change requests are immediately rejected by the RM synchronously in the 
allocate RPC?

bq. I am not against providing a separate cancel API. But I think the API needs 
to be clear that the cancel is only for increase request, NOT decrease request 
(just like we don't have something like cancel release container).
Having a simple cancel request regardless of increase or decrease is preferable 
since then we are not leaking the current state of the implementation to the 
user. It is future safe E.g. later if we find an issue with decreasing and 
fixing that makes it non-instantaneous, then we dont want to have to change the 
API to support that. But today, given that it is instantaneous, we can simply 
ignore the cancellation of a decrease in the cancel method of AMRMClient. I 
think the RM does not support a cancel container resource change request. Does 
it? If it does not, then perhaps this API can be discussed/added in a separate 
jira after there is back end support for it.


> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty

2015-10-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959579#comment-14959579
 ] 

Bikas Saha commented on YARN-3911:
--

I am sorry for the super delayed response. For some reason I was not a watcher 
on this. By tail I did not mean linux tail command but just the concept of 
sending the last set of entries (say last 4K of data) from the stderr and/or 
syslog if the container launch fails.

> Add tail of stderr to diagnostics if container fails to launch or it 
> container logs are empty
> -
>
> Key: YARN-3911
> URL: https://issues.apache.org/jira/browse/YARN-3911
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> The stderr may have useful info in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950835#comment-14950835
 ] 

Bikas Saha commented on YARN-1509:
--

A change container request (maybe not supported now) can be increase cpu + 
decrease memory. Hence a built in concept of increase and decrease in the API 
is something I am wary off.
So how about
{code} public abstract void onContainersResourceChanged(
Map oldToNewContainers); 
OR 
public abstract void onContainersResourceChanged(
List  updatedContainerInfo);{code}

Would there be a case (maybe not currently) when a change container request can 
fail on the RM? Should the callback allow notifying about a failure to change 
the container?
What is the RM notifies AMRMClient about a container completed. That container 
happens to have a pending change request? What should happen in this case? 
Should the AMRM client clear that pending request? Should it also notify the 
user that pending container change request has failed or just rely on 
onContainerCompleted() to let the AM get that information.

I would be wary of overloading cancel with a second container change request. 
To be clear, here we are discussing user facing semantics and API. Having clear 
semantics is important vs implicit or overloaded behavior. E.g. are there cases 
where an increase followed by a decrease request is a valid scenario and how 
would that be different compare to an increase followed by a cancel. Should the 
RM do different things for increase followed by cancel vs increase followed by 
decrease?

AM restart does not need any handling since the AM is going to start from a 
clean slate. Sorry, my bad.

I missed the handling of the RM restart case. Is there an existing test for 
that code path that could be augmented to make sure that the new changes are 
tested?


> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947315#comment-14947315
 ] 

Bikas Saha commented on YARN-1509:
--

Sorry for coming in late on this. I have some questions on the API.

Why are there separate methods for increase and decrease instead of a single 
method to change the container resource size? By comparing the existing 
resource allocation to a container and the new requested resource allocation, 
it should be clear whether an increase or decrease is being requested.

Also, for completeness, is there a need for a cancelContainerResourceChange()? 
After a container resource change request has been submitted, what are my 
options as a user other than to wait for the request to be satisfied by the RM? 

If I release the container, then does it mean all pending change requests for 
that container should be removed? From a quick look at the patch, it does not 
look like that is being covered, unless I am missing something.

What will happen if the AM restarts after submitting a change request. Does the 
AM-RM re-register protocol need an update to handle the case of 
re-synchronizing on the change requests? Whats happens if the RM restarts? If 
these are explained in a document, then please point me to the document. The 
patch did not seem to have anything around this area. So I thought I would ask.

Also, why have the callback interface methods been made non-public? Would that 
be an incompatible change?

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Followup fixes after YARN-2019 regarding RM behavior when state-store error occurs

2015-09-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734102#comment-14734102
 ] 

Bikas Saha commented on YARN-4087:
--

Repeating my comments from YARN-2019 here 
There would be 2 kinds of state store operations - reads and writes. Writes may 
be of 2 kinds - critical and non-critical. E.g. saving an application 
submission is critical. Saving a node information is perhaps not critical. It 
would affect system correctness is critical write operation errors are allowed 
to be ignored. We end up with YARN-4118 and other such potential issues. 
Essentially we are turning a write-ahead log into something that optional. I 
dont see how the system can make stable reliability guarantees by making the 
write-ahead log non-fatal.
On the other hand read errors or non-critical write errors should not block RM 
progress but do need to be potentially retried. That also does not seem to be 
addressed in the patch.


> Followup fixes after YARN-2019 regarding RM behavior when state-store error 
> occurs
> --
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-4087-branch-2.6.patch, YARN-4087.1.patch, 
> YARN-4087.2.patch, YARN-4087.3.patch, YARN-4087.5.patch, YARN-4087.6.patch, 
> YARN-4087.7.patch
>
>
> Several fixes:
> 1. Set YARN_FAIL_FAST to be false by default, since this makes more sense in 
> production environment.
> 2. If HA is enabled and if there's any state-store error, after the retry 
> operation failed, we always transition RM to standby state.  Otherwise, we 
> may see two active RMs running. YARN-4107 is one example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734100#comment-14734100
 ] 

Bikas Saha commented on YARN-2019:
--

Sorry for coming in late on this. There would be 2 kinds of state store 
operations - reads and writes. Writes may be of 2 kinds - critical and 
non-critical. E.g. saving an application submission is critical. Saving a node 
information is perhaps not critical. It would affect system correctness is 
critical write operation errors are allowed to be ignored. We end up with 
YARN-4118 and other such potential issues. Essentially we are turning a 
write-ahead log into something that optional. I dont see how the system can 
make stable reliability guarantees by making the write-ahead log non-fatal.
On the other hand read errors or non-critical write errors should not block RM 
progress but do need to be potentially retried. That also does not seem to be 
addressed in the patch.

> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729917#comment-14729917
 ] 

Bikas Saha commented on YARN-3942:
--

In Option 1 the latency would be the time to read the entire session file for a 
session that has run many DAGs, right?
Since we know the file size to be read, could we return a message saying 
something like "scanning file size FOO. Expect BAR latency"?

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717907#comment-14717907
 ] 

Bikas Saha commented on YARN-4088:
--

Right. So the combined objective is to continue to have small heartbeat 
intervals with larger clusters while still using the central scheduler for all 
allocations. Clearly, in theory, that is a bottleneck by design and our attempt 
is to engineer our way out of it for medium size clusters. Right? :)

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717831#comment-14717831
 ] 

Bikas Saha commented on YARN-4088:
--

Why not on a 3K cluster? We could slowdown heartbeats to (say 10s) on a 3K node 
cluster. That should work though I agree that NM info would be stale for 
longer, if that's your point.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717746#comment-14717746
 ] 

Bikas Saha commented on YARN-4088:
--

Is the suggestion to process them in concurrently? Not quite sure what async 
means here? Is it async wrt the RPC thread?
Another alternative would be to dynamically adjust the NM heartbeat interval. 
IIRC, the NM next heartbeat interval is sent by the RM in the response to the 
heartbeat. If not, then this could be added. The RM could potentially increase 
this interval till it reaches a steady/stable state of heartbeat processing. 
This would help in self-adjusting to cluster sizes. Small for small cluster and 
high for high cluster. This could tune up under high load and then tune down 
once load diminishes.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660415#comment-14660415
 ] 

Bikas Saha commented on YARN-3736:
--

Sounds good!

> Add RMStateStore apis to store and load accepted reservations for failover
> --
>
> Key: YARN-3736
> URL: https://issues.apache.org/jira/browse/YARN-3736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
> YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
> YARN-3736.005.patch
>
>
> We need to persist the current state of the plan, i.e. the accepted 
> ReservationAllocations & corresponding RLESpareseResourceAllocations  to the 
> RMStateStore so that we can recover them on RM failover. This involves making 
> all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658798#comment-14658798
 ] 

Bikas Saha commented on YARN-3736:
--

Folks, how much load is this going to add to the state store (in terms of 
actual data volume and rate of stores/updates)? The original design of the RM 
state store assumes low volume of data and updates (proportional to number of 
new applications entering the system). So asking. Please let me know if this 
has been discussed and I missed that part.

> Add RMStateStore apis to store and load accepted reservations for failover
> --
>
> Key: YARN-3736
> URL: https://issues.apache.org/jira/browse/YARN-3736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
> YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
> YARN-3736.005.patch
>
>
> We need to persist the current state of the plan, i.e. the accepted 
> ReservationAllocations & corresponding RLESpareseResourceAllocations  to the 
> RMStateStore so that we can recover them on RM failover. This involves making 
> all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646700#comment-14646700
 ] 

Bikas Saha commented on YARN-2005:
--

I am fine for opening a separate jira for the specific case I mentioned. Opened 
YARN-3994 for that. If you want you can extend its scope to blacklisting.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3994) RM should respect AM resource/placement constraints

2015-07-29 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-3994:


 Summary: RM should respect AM resource/placement constraints
 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha


Today, locality and cpu for the AM can be specified in the AM launch container 
request but are ignored at the RM. Locality is assumed to be ANY and cpu is 
dropped. There may be other things too that are ignored. This should be fixed 
so that the user gets what is specified in their code to launch the AM. cc 
[~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644966#comment-14644966
 ] 

Bikas Saha commented on YARN-2005:
--

I believe the reverse case is also valid. A user may want to specify a locality 
constraint for the AM but today that is ignored because its dropped from the am 
resource request by the RM. Similarly, only the memory resource constraint is 
used and others (cpu etc) are dropped. Perhaps this jira should tackle this 
problem wholistically by fully respecting the resource request as specified in 
the am container launch context.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty

2015-07-10 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-3911:


 Summary: Add tail of stderr to diagnostics if container fails to 
launch or it container logs are empty
 Key: YARN-3911
 URL: https://issues.apache.org/jira/browse/YARN-3911
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


The stderr may have useful info in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606218#comment-14606218
 ] 

Bikas Saha commented on YARN-1197:
--

There has been a lot of discussion that looks like its converging. It would be 
helpful for the other interested (but not deeply involved) people, if there was 
an updated design document with details about the agreed upon design. Also, if 
this document could outline some of the intuition/logic behind the design 
choices (like going through AM for low latency) then it would super useful. 
Thanks!

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548492#comment-14548492
 ] 

Bikas Saha commented on YARN-1902:
--

An alternate approach that we tried in Apache Tez is to wrap a TaskScheduler 
around the AMRMClient that would take request from the application and do the 
matching internally. Since it would know the matching, it could automatically 
remove the matched requests also. (Still does not remove the race condition but 
it cleaner wrt to the user as an API). The TaskScheduler was written to be 
independent of Tez code so that we could contribute it to YARN as a library, 
however we did not find time to do so. Now that code has evolved quite a bit 
but the original, well-tested code could still be extracted from Tez 0.1 branch 
and contributed to YARN if someone is interested in doing that work.

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>Assignee: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546421#comment-14546421
 ] 

Bikas Saha commented on YARN-1902:
--

Yes. And then the RM may give a container on H1 which is not useful for the 
app. If we again auto-decrement and release the container then we end up with 2 
outstanding requests and the job will hang because it needs 3 containers.

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>Assignee: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546377#comment-14546377
 ] 

Bikas Saha commented on YARN-1902:
--

The AMRMClient was not written to automatically remove requests because it does 
not know which requests will be matched to allocated containers. The explicit 
contract is for users of AMRMClient to remove requests that have been matched 
to containers.

If we change that behavior to automatically remove requests then it may lead to 
issues where 2 entities are removing requests. 1) user 2) AMRMClient. So that 
change should only be made in a different version of AMRMClient or else 
existing users will break.

In the worst case, if the AMRMClient (automatically) removes the wrong request 
then the application will hang because the RM will not provide it the container 
that is needed. Not automatically removing the request has the downside of 
getting additional containers that need to be released by the application. We 
chose excess containers over hanging for the original implementation. 

Excess containers should happen rarely because the user controls when 
AMRMClient heartbeats to the RM and can do that after having removed all 
matched requests, so that the remote request table reflects the current state 
of outstanding requests. There may still be a race condition on the RM side 
that gives more containers. Excess containers can happen more often with 
AMRMClientAsync, because it heartbeats at a regular schedule and can send more 
requests than really outstanding if the heartbeat goes out before the user has 
removed the matched requests.


> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>Assignee: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-240) Rename ProcessTree.isSetsidAvailable

2015-05-01 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved YARN-240.
-
Resolution: Won't Fix

> Rename ProcessTree.isSetsidAvailable
> 
>
> Key: YARN-240
> URL: https://issues.apache.org/jira/browse/YARN-240
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: trunk-win
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> The logical use of this member is to find out if processes can be grouped 
> into a unit for process manipulation. eg. killing process groups etc.
> setsid is the Linux implementation and it leaks into the name.
> I suggest renaming it to isProcessGroupAvailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-255) Support secure AM launch for unmanaged AM's

2015-05-01 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-255:

Assignee: (was: Bikas Saha)

> Support secure AM launch for unmanaged AM's
> ---
>
> Key: YARN-255
> URL: https://issues.apache.org/jira/browse/YARN-255
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Bikas Saha
>
> Currently unmanaged AM launch does not get security tokens because tokens are 
> passed by the RM to the AM via the NM during AM container launch. For 
> unmanaged AM's the RM can send tokens in the SubmitApplicationResponse to the 
> secure client. The client can then pass these onto the AM in a manner similar 
> to the NM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-394) RM should be able to return requests that it cannot fulfill

2015-05-01 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-394:

Assignee: (was: Bikas Saha)

> RM should be able to return requests that it cannot fulfill
> ---
>
> Key: YARN-394
> URL: https://issues.apache.org/jira/browse/YARN-394
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> Currently, the RM has no way of returning requests that cannot be met. e.g. 
> if the app wants a specific node and that node dies, then the RM should 
> return that request instead of holding onto to it indefinitely. 
> Some situations in which this would be useful are:
> * After YARN-392, requests are location specific, and the locations that were 
> requested are no longer in the cluster.
> * A high memory machine is lost, and resource requests above certain sizes 
> are no longer able to be satisfied anywhere.
> * All nodes in the cluster become unavailable.
> At these points, there is no way the RM can inform the apps about its 
> inability to allocate requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-456) allow OS scheduling priority of NM to be different than the containers it launches for Windows

2015-05-01 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-456:

Assignee: (was: Bikas Saha)

> allow OS scheduling priority of NM to be different than the containers it 
> launches for Windows
> --
>
> Key: YARN-456
> URL: https://issues.apache.org/jira/browse/YARN-456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Bikas Saha
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2015-05-01 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523973#comment-14523973
 ] 

Bikas Saha commented on YARN-556:
-

[~jianhe] [~adhoot] [~kasha] [~vinodkv] Should we resolve this jira as complete?

> RM Restart phase 2 - Work preserving restart
> 
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2015-05-01 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-745:

Assignee: (was: Bikas Saha)

> Move UnmanagedAMLauncher to yarn client package
> ---
>
> Key: YARN-745
> URL: https://issues.apache.org/jira/browse/YARN-745
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bikas Saha
>
> Its currently sitting in yarn applications project which sounds wrong. client 
> project sounds better since it contains the utilities/libraries that clients 
> use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-908) URL is a YARN API record instead of being java.net.url

2015-05-01 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523396#comment-14523396
 ] 

Bikas Saha commented on YARN-908:
-

The issue is why not use standard java URL instead of YARN specific URL. One of 
the things YARN URL does is serialize to PB but it can be serialized as a 
string instead. So what is the use of a YARN URL class? It just confuses users.

> URL is a YARN API record instead of being java.net.url
> --
>
> Key: YARN-908
> URL: https://issues.apache.org/jira/browse/YARN-908
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>
> Why is it not a java.net.url?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2015-03-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355395#comment-14355395
 ] 

Bikas Saha commented on YARN-2140:
--

This paper may have useful insights into the network sharing issues.
http://research.microsoft.com/en-us/um/people/srikanth/data/nsdi11_seawall.pdf


> Add support for network IO isolation/scheduling for containers
> --
>
> Key: YARN-2140
> URL: https://issues.apache.org/jira/browse/YARN-2140
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: NetworkAsAResourceDesign.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325548#comment-14325548
 ] 

Bikas Saha commented on YARN-2261:
--

Looks like AM preemption will not fail the AM and so the comments about AM 
preemption are probably not valid.

> YARN should have a way to run post-application cleanup
> --
>
> Key: YARN-2261
> URL: https://issues.apache.org/jira/browse/YARN-2261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> See MAPREDUCE-5956 for context. Specific options are at 
> https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325547#comment-14325547
 ] 

Bikas Saha commented on YARN-2261:
--

Sounds reasonable. Though things like how does YARN expose this information to 
the user would eventually need to be thought about. Currently the RM page shows 
running applications. How will show cleanup in progress? Can the original AM be 
preempted due to lack of resources? In that case, how will we launch the clean 
up container. Though probably the same problem exists now because if an AM is 
preempted it would not be able to clean up. However, moving this responsibility 
from the user to YARN (as proposed in this jira) makes that a YARN problem to 
solve.

> YARN should have a way to run post-application cleanup
> --
>
> Key: YARN-2261
> URL: https://issues.apache.org/jira/browse/YARN-2261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> See MAPREDUCE-5956 for context. Specific options are at 
> https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292848#comment-14292848
 ] 

Bikas Saha commented on YARN-3025:
--

Yes. But that would mean that the RM cannot provide the latest updates.

> Provide API for retrieving blacklisted nodes
> 
>
> Key: YARN-3025
> URL: https://issues.apache.org/jira/browse/YARN-3025
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List blacklistAdditions,
>   List blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291522#comment-14291522
 ] 

Bikas Saha commented on YARN-3025:
--

In the worst case, the frequency of this update can be once per heartbeat per 
AM. That can be high. Lets say 1000 AMs pinging every 1 sec.

> Provide API for retrieving blacklisted nodes
> 
>
> Key: YARN-3025
> URL: https://issues.apache.org/jira/browse/YARN-3025
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List blacklistAdditions,
>   List blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-01-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271658#comment-14271658
 ] 

Bikas Saha commented on YARN-3025:
--

The AM is expected to maintain its own state. However, this could be part of 
the initial sync with the RM when the AM connects to the RM, it could provide 
list of known running containers (which the AM could connect to) etc. However, 
the RM probably does not persist this information after RM restart. So it may 
not be able to provide this information if both the RM and AM restart, leading 
to inconsistent response based on whether one or both have restarted before 
reconnecting.

> Provide API for retrieving blacklisted nodes
> 
>
> Key: YARN-3025
> URL: https://issues.apache.org/jira/browse/YARN-3025
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List blacklistAdditions,
>   List blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-12-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233341#comment-14233341
 ] 

Bikas Saha commented on YARN-2139:
--

So to be clear, currently vdisks is counting the number of physical drives 
present on the box.

Something to keep in mind would be whether this also entails a change in the NM 
policy of providing a directly on every local dir (which typically maps to 
every disk) to every task. And tasks are free to choose one or more of those 
dirs (disks) to write to. This puts the spinning disk head under contention and 
affects performance of all writers on that disk because seeks are expensive. 
The thumb rule tends to be to allocate as many number of tasks to a machine as 
the number of disks (maybe 2x) so as to keep this seek cost low. Should we 
consider evaluating a change in this policy that gives a container 1 local dir 
to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks) 
would have 6 tasks running, each with their own "dedicated" disk. Off hand its 
hard to say how this would compare with all 6 disks allocated to all 6 tasks 
and letting cgroups enforce sharing. If multiple tasks end up choosing the same 
disk for their writes, then they may not end up getting the "allocation" that 
they thought they would get.

> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
> Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
> YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-12-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232448#comment-14232448
 ] 

Bikas Saha commented on YARN-2139:
--

Is the concept of vdisk representing a spinning disk or is it going to be some 
pluggable API?

> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
> Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
> YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-12-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232133#comment-14232133
 ] 

Bikas Saha commented on YARN-2139:
--

Thanks for the update.
Its not clear to me how we are going to clearly de-couple 1) and 2) from 3). 
From first thoughts, scheduling is what prevents over-allocation and the NM 
enforces the scheduling decision.
Could you please throw some light on that?


> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
> Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
> YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-11-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228030#comment-14228030
 ] 

Bikas Saha commented on YARN-1902:
--

If all the requests are for the same location (say star) then the client needs 
to sends all outstanding requests to the RM. The AM-RM protocol is not a delta 
protocol. Sending deltas will not work because the RM expects all requests at a 
given location to be presented for every update. It simply sets that value from 
the request to its internal book-keeping objects (vs performing an increment 
for the "new" request).

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-21 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221591#comment-14221591
 ] 

Bikas Saha commented on YARN-2139:
--

Given that this design and possible implementation might go through unstable 
rounds and are currently not abstracted enough in the core code, doing this on 
a branch seems prudent. 
Given that SSDs are becoming common, thinking of storage as only spinning disks 
may be limited. Multiple writers  may affect each other more negatively on 
spinning disk vs SSDs. It may be useful to see if the consideration of storage 
could be abstracted into a plugin so that storage could have a different 
resource allocation policy by storage type (e.g. allocate/share by spindle for 
spinning disk storage vs allocate/share by iops on ssd storage vs 
allocate/share by network bandwidth for non-DAS storage). If we can abstract 
the policy into a plugin on trunk itself then perhaps we would not need a 
branch. Secondly, it will probably take a long time to agree on what a common 
policy should be and the consensus decision will probably not be a good fit for 
a large percentage of real clusters because of hardware variety. So making this 
a plugin would enable quicker development, trial and usage of disk based 
allocation compared to arriving at a grand unified allocation model for storage.

> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
> YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188516#comment-14188516
 ] 

Bikas Saha commented on YARN-1902:
--

bq. Given a ContainerRequest x with Resource y, when addContainerRequest is 
called z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Firstly, I am not sure if the same ContainerRequest object can be passed 
multiple times in addContainerRequest. It should be different objects each time 
(even if they point to the same resource). This might have something to do with 
the internal book-keeping done for matching requests.

Secondly, after z requests are made and 1 allocation is received then z-1 
requests remain. If you are using AMRMClientImpl then its your (users) 
responsibility to call removeContainerRequest() for the request that was 
matched to this container. The AMRMClient does not know which of your z 
requests could be assigned to this container. So in the general case, it cannot 
automatically remove a request from the internal table because it does not know 
which request to remove. If the javadocs dont clarify these semantics then we 
can improve the javadocs.

Thirdly, the protocol between the AMRMClient and the RM has an inherent race. 
So if the client had earlier asked for z containers and in the next heartbeat 
reduces that to z-1, the RM may actually return z containers to it because it 
had already allocated them to this client before the client updated the RM with 
the new value.

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-10-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171459#comment-14171459
 ] 

Bikas Saha commented on YARN-2314:
--

My understanding from the comments was that in most cases this cache was adding 
overhead without benefit since the RPC layer was not controlled by the cache.

We have no empirical evidence either ways about the performance. If you know of 
cases where this change of default might cause issues, then it would be helpful 
if they were enumerated in a comment. Then Tez/other users could test for those 
cases when they upgrade to 2.6 and make their own choices.

> ContainerManagementProtocolProxy can create thousands of threads for a large 
> cluster
> 
>
> Key: YARN-2314
> URL: https://issues.apache.org/jira/browse/YARN-2314
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2314.patch, disable-cm-proxy-cache.patch, 
> nmproxycachefix.prototype.patch
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
> this cache is configurable.  However the cache can grow far beyond the 
> configured size when running on a large cluster and blow AM address/container 
> limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-10-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171429#comment-14171429
 ] 

Bikas Saha commented on YARN-2314:
--

To be clear, my question was only to clarify if Tez would get the benefits 
without doing anything because the defaults are correct. Looks like that is the 
case. Thanks!

> ContainerManagementProtocolProxy can create thousands of threads for a large 
> cluster
> 
>
> Key: YARN-2314
> URL: https://issues.apache.org/jira/browse/YARN-2314
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2314.patch, disable-cm-proxy-cache.patch, 
> nmproxycachefix.prototype.patch
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
> this cache is configurable.  However the cache can grow far beyond the 
> configured size when running on a large cluster and blow AM address/container 
> limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-10-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171408#comment-14171408
 ] 

Bikas Saha commented on YARN-2314:
--

Folks, this is something that would be of interest in Tez since it uses the 
ContainerManagementProtocolProxy. My summary understanding is that the default 
is to turn this proxy off and this improves things for large scale clusters. So 
when Tez moves to 2.6 then it will automatically pick the defaults (which turn 
caching off) and benefit for large clusters. Is that correct?

> ContainerManagementProtocolProxy can create thousands of threads for a large 
> cluster
> 
>
> Key: YARN-2314
> URL: https://issues.apache.org/jira/browse/YARN-2314
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-2314.patch, disable-cm-proxy-cache.patch, 
> nmproxycachefix.prototype.patch
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
> this cache is configurable.  However the cache can grow far beyond the 
> configured size when running on a large cluster and blow AM address/container 
> limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059094#comment-14059094
 ] 

Bikas Saha commented on YARN-2261:
--

Would that be an existing issue that needs to be tracked separately?

> YARN should have a way to run post-application cleanup
> --
>
> Key: YARN-2261
> URL: https://issues.apache.org/jira/browse/YARN-2261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> See MAPREDUCE-5956 for context. Specific options are at 
> https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058979#comment-14058979
 ] 

Bikas Saha commented on YARN-2261:
--

The cleanup would be indistinguishable from an AM that is cleaning up post job 
completion (as it happens today). Specially, if we use the second approach (via 
AM cleanup mode) this would be virtually indistinguishable from what happens 
today.

> YARN should have a way to run post-application cleanup
> --
>
> Key: YARN-2261
> URL: https://issues.apache.org/jira/browse/YARN-2261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> See MAPREDUCE-5956 for context. Specific options are at 
> https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-08 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055231#comment-14055231
 ] 

Bikas Saha commented on YARN-2261:
--

+1 for having the control/responsibility in YARN
An alternative that may fit better with the RM model of launching AM's is to 
optionally, have the RM run the AM in cleanup mode. This way the clean up logic 
can reside in the AM as it does today and the RM does not need to learn any new 
tricks about launching anything other that the AM. The existing launch context 
is used to launch the AM and the AM is told (via env or via register) that its 
in cleanup mode. The AM can use its logic to do cleanup and then successfully 
unregister with the RM. Until the unregister happens the RM can keep restarting 
the AM in clean up mode for a max of N times (to handled unexpected failures). 
When an AM is running in cleanup mode the it will not be allowed to make any 
allocated requests. This can be handled via AMRMClient so that AM's that use 
AMRMClient dont need to do anything. The ApplicationMasterService, of course, 
will need to handle this for non AMRMClient AM's.
By this method, minimal changes will be needed in the API and RM internals to 
enable this feature in a compatible manner.

> YARN should have a way to run post-application cleanup
> --
>
> Key: YARN-2261
> URL: https://issues.apache.org/jira/browse/YARN-2261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> See MAPREDUCE-5956 for context. Specific options are at 
> https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053201#comment-14053201
 ] 

Bikas Saha commented on YARN-1366:
--

bq. I meant an empty response.
After this does AMRMClientASync need to see AMCommand.RE_SYNC? That should be 
dead code right?

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
> YARN-1366.11.patch, YARN-1366.12.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
> YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050978#comment-14050978
 ] 

Bikas Saha commented on YARN-1366:
--

Does a null response make sense for the user?

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
> YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
> YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
> YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
> YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050965#comment-14050965
 ] 

Bikas Saha commented on YARN-1366:
--

Why are we returning the old allocateResponse to the user? What is the user 
expected to do with this allocateResponse that has a RESYNC command in it? 
Should we make a second call to allocate (after re-registering) and then send 
that response back up to the user?
{code}+// re register with RM
+registerApplicationMaster();
+return allocateResponse;
+  }{code}

There needs to be some clear documentation that if the user has not removed 
container requests that have already been satisfied, then the re-register may 
end up sending the entire ask list to the RM (including matched requests). 
Which would mean the RM could end up giving it a lot of new allocated 
containers.

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
> YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
> YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
> YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
> YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043029#comment-14043029
 ] 

Bikas Saha commented on YARN-614:
-

Steve, what you want should already happen. AM will suicide and the RM will 
restart it and count it as an AM failure, just like it does today. This jira is 
just making other source of failure like node loss, not count as an AM failure.

> Retry attempts automatically for hardware failures or YARN issues and set 
> default app retries to 1
> --
>
> Key: YARN-614
> URL: https://issues.apache.org/jira/browse/YARN-614
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Fix For: 2.5.0
>
> Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.7.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034876#comment-14034876
 ] 

Bikas Saha commented on YARN-2052:
--

With 32 bits for epoch number we have 4 billion restarts before it overflows. 
We are probably safe without any handling.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034732#comment-14034732
 ] 

Bikas Saha commented on YARN-2052:
--

Ah. I did not see the rest of the comment. Yes. Integer overflow is a problem. 
We should make it a long in the same release as the epoch number addition so 
that we dont have to worry about that.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034731#comment-14034731
 ] 

Bikas Saha commented on YARN-2052:
--

Why would ContainerId#compareTo fail? Existing containerId's should remain 
unchanged after RM restart. Only new container ids should have a different 
epoch number.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps

2014-06-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034700#comment-14034700
 ] 

Bikas Saha commented on YARN-1373:
--

Sorry I am not clear how this is a dup. This jira is tracking new behavior in 
the RM that will transition a recovered RMAppImpl/RMAppAttemptImpl (and still 
running for real) app to a RUNNING state instead of a terminal recovered state. 
This is to ensure that the state machines are in the correct state for the 
running AM to resync and continue as running. This is not related to killing 
the app master process on the NM.

> Transition RMApp and RMAppAttempt state to RUNNING after restart for 
> recovered running apps
> ---
>
> Key: YARN-1373
> URL: https://issues.apache.org/jira/browse/YARN-1373
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently the RM moves recovered app attempts to the a terminal recovered 
> state and starts a new attempt. Instead, it will have to transition the last 
> attempt to a running state such that it can proceed as normal once the 
> running attempt has resynced with the ApplicationMasterService (YARN-1365 and 
> YARN-1366). If the RM had started the application container before dying then 
> the AM would be up and trying to contact the RM. The RM may have had died 
> before launching the container. For this case, the RM should wait for AM 
> liveliness period and issue a kill container for the stored master container. 
> It should transition this attempt to some RECOVER_ERROR state and proceed to 
> start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034691#comment-14034691
 ] 

Bikas Saha commented on YARN-2052:
--

bq. Had an offline discussion with Vinod. Maybe it's still better to persist 
some sequence number to indicate the number of RM restarts when RM starts up.
Is this the same as the epoch number that was mentioned earlier in this jira? 
https://issues.apache.org/jira/browse/YARN-2052?focusedCommentId=13996675&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13996675.
 Seems to me that its the same with epoch number changed to num-rm-restarts.



> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch
>
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028734#comment-14028734
 ] 

Bikas Saha commented on YARN-2052:
--

Dont we already read and write synchronously from the store during RM startup? 
If we have an epoch number then it must be persisted.

> ContainerId creation after work preserving restart is broken
> 
>
> Key: YARN-2052
> URL: https://issues.apache.org/jira/browse/YARN-2052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>
> Container ids are made unique by using the app identifier and appending a 
> monotonically increasing sequence number to it. Since container creation is a 
> high churn activity the RM does not store the sequence number per app. So 
> after restart it does not know what the new sequence number should be for new 
> allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028213#comment-14028213
 ] 

Bikas Saha commented on YARN-2148:
--

lgtm. +1.

> TestNMClient failed due more exit code values added and passed to AM
> 
>
> Key: YARN-2148
> URL: https://issues.apache.org/jira/browse/YARN-2148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2148.patch
>
>
> Currently, TestNMClient will be failed in trunk, see 
> https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
> {code}
> Test cases in TestNMClient uses following code to verify exit code of 
> COMPLETED containers
> {code}
>   testGetContainerStatus(container, i, ContainerState.COMPLETE,
>   "Container killed by the ApplicationMaster.", Arrays.asList(
>   new Integer[] {137, 143, 0}));
> {code}
> But YARN-2091 added logic to make exit code reflecting the actual status, so 
> exit code of the "killed by ApplicationMaster" will be -105,
> {code}
>   if (container.hasDefaultExitCode()) {
> container.exitCode = exitEvent.getExitCode();
>   }
> {code}
> We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028216#comment-14028216
 ] 

Bikas Saha commented on YARN-2148:
--

[~ozawa] Can you please apply this patch and run the full YARN tests. I would 
like to make sure that we catch all tests that may have failed due to the 
container exit status. I can commit the patch if there are no new tests. Else 
we should fix them all in this jira. Thanks!

> TestNMClient failed due more exit code values added and passed to AM
> 
>
> Key: YARN-2148
> URL: https://issues.apache.org/jira/browse/YARN-2148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2148.patch
>
>
> Currently, TestNMClient will be failed in trunk, see 
> https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
> {code}
> Test cases in TestNMClient uses following code to verify exit code of 
> COMPLETED containers
> {code}
>   testGetContainerStatus(container, i, ContainerState.COMPLETE,
>   "Container killed by the ApplicationMaster.", Arrays.asList(
>   new Integer[] {137, 143, 0}));
> {code}
> But YARN-2091 added logic to make exit code reflecting the actual status, so 
> exit code of the "killed by ApplicationMaster" will be -105,
> {code}
>   if (container.hasDefaultExitCode()) {
> container.exitCode = exitEvent.getExitCode();
>   }
> {code}
> We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027187#comment-14027187
 ] 

Bikas Saha commented on YARN-1365:
--

Sounds like the right approach. Keeps things consistent. Allowing unregister 
without register (while sounding harmless by itself) would need changes in the 
state machine to support and also breaks the existing contract that even an 
"empty" application needs to at least call register and unregister or its 
considered failed.

> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---
>
> Key: YARN-1365
> URL: https://issues.apache.org/jira/browse/YARN-1365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2091) Add more values to ContainerExitStatus and pass it from NM to RM and then to app masters

2014-06-10 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-2091:
-

Summary: Add more values to ContainerExitStatus and pass it from NM to RM 
and then to app masters  (was: Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and 
pass it to app masters)

> Add more values to ContainerExitStatus and pass it from NM to RM and then to 
> app masters
> 
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch, YARN-2091.7.patch, 
> YARN-2091.8.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026672#comment-14026672
 ] 

Bikas Saha commented on YARN-2140:
--

[~ywskycn] For this and YARN-2139 my suggestion would be to first post a design 
sketch and discuss some alternatives. You may prototype some approach to get 
supporting data for that design doc. This will help get community interaction 
and understanding for your proposal and enable quicker progress.

> Add support for network IO isolation/scheduling for containers
> --
>
> Key: YARN-2140
> URL: https://issues.apache.org/jira/browse/YARN-2140
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
>Assignee: Wei Yan
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-06-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020350#comment-14020350
 ] 

Bikas Saha commented on YARN-2091:
--

How about naming it hasDefaultExitCode() and directly using 
ContainerExitStatus.INVALID instead of creating a new member variable.
{code}+
+  private boolean isDefaultExitCode() {
+return (this.exitCode == DEFAULT_EXIT_CODE);
+  }{code}

Can we rename this to getContainerExitStatus() so that its clear that we are 
getting a ContainerExitStatus value.
{code}+  public int getReason() {
+return this.reason;
+  }{code}

I would suggest the following names
KILL_AM_STOP_CONTAINER -> KILLED_BY_APPMASTER - Container was terminated by 
stop request by the app master
KILL_BY_RESOURCEMANAGER -> KILLED_BY_RESOURCEMANAGER - Container was terminated 
by the resource manager.
KILL_FINISHED_APPMASTER -> KILLED_AFTER_APP_COMPLETION - Container was 
terminated after the application finished
KILL_EXCEEDED_PMEM -> KILLED_EXCEEDED_PMEM - Container terminated because of 
exceeding allocated physical memory
KILL_EXCEEDED_VMEM -> KILLED_EXCEEDED_VMEM - Container terminated because of 
exceeding allocated virtual memory

Spurious edit?
{code}-  //if the current state is NEW it means the CONTAINER_INIT was 
never 
+  //if the current state is NEW it means the CONTAINER_INIT was never{code}

I am +1 after these comments. [~vinodkv] Do you have any further comments?


> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch, YARN-2091.7.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-06-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016081#comment-14016081
 ] 

Bikas Saha commented on YARN-2091:
--

If we are sure that the default value is set in the code to 
ContainerExitStatus.INVALID then sounds good. Given that 
ContainerExitStatus.INVALID == 5000 we have to explicitly initialize with that 
value since Java will default to 0.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-06-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015611#comment-14015611
 ] 

Bikas Saha commented on YARN-2091:
--

Can this miss a case when the exitCode has not been set (eg when the container 
crashes on its own)? Should we check if the exitCode has already been set (eg. 
via a kill event) and if its not set then set it from exitEvent? How can we 
check if the exitCode has not been set? Maybe have some uninitialized/invalid 
default value.
{code}@@ -829,7 +829,6 @@ public void transition(ContainerImpl container, 
ContainerEvent event) {
 @Override
 public void transition(ContainerImpl container, ContainerEvent event) {
   ContainerExitEvent exitEvent = (ContainerExitEvent) event;
-  container.exitCode = exitEvent.getExitCode();{code}

The new exit status code need better comments/docs. E.g. what is the difference 
between to 2 new appmaster related exit status. Is kill_by_resourcemanager a 
generic value that can be replaced later on by a more specific reason like 
preempted?

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014284#comment-14014284
 ] 

Bikas Saha commented on YARN-2091:
--

We can check all cases of ContainerKillEvent and add new ExitStatus values 
where it makes sense or use some good default value. If a test needs to change 
to account for a new value then we should change the test. There may be other 
cases of exit status being set or tested which are unrelated to container kill 
event. Those can stay out of the scope of this jira.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014079#comment-14014079
 ] 

Bikas Saha commented on YARN-2091:
--

Why is "isAMAware" needed. All values in ContainerExitStatus are public and 
hence user code should already be aware of them.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013071#comment-14013071
 ] 

Bikas Saha commented on YARN-2091:
--

That would make sense if YARN would allow specifying pmem and vmem separately 
in resource request. Without that the information is not actionable.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012980#comment-14012980
 ] 

Bikas Saha commented on YARN-2091:
--

Instead of having the following if-else code everywhere, can we simply always 
use the "reason" from the kill event. That way if we add a different reason 
tomorrow (exceeded disk quota) then we dont have to find all these special 
cases.
{code}+  if (killEvent.getReason() == 
ContainerExitStatus.KILL_EXCEEDED_MEMORY) {
+container.exitCode = killEvent.getReason();
+  } else {
+container.exitCode = ExitCode.TERMINATED.getExitCode();
+  }{code}
As an exercise, we can find other cases of ContainerKillEvent and see if new 
enums (like exceeded_memory) can be added. Or if a suitable default value can 
be found. Clearly, if we are killing then there should be a reason.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010630#comment-14010630
 ] 

Bikas Saha commented on YARN-2091:
--

We are on the same page. The kill reason is directly a ContainerExitStatus.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010585#comment-14010585
 ] 

Bikas Saha commented on YARN-2091:
--

Thats the missing pieces AFAIK. That exit reason needs to be passed along 
internally through the NM and then on to the RM and AM. Maybe simply directly 
use ContainerExitStatus instead of a new reason object inside 
ContainerKillEvent.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-05-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007866#comment-14007866
 ] 

Bikas Saha commented on YARN-796:
-

Thanks [~john.jian.fang] An interesting use case from your comment is that not 
only can labels be used to specify affinity but they can be used to specify 
anti-affinity ie. dont place task on a certain label.
Do I correctly understand as this being your use case?
OR
Is your ask that node managers should specify their own labels when they 
register with the RM instead of the node manager to label mapping being a 
central RM configuration?

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: YARN-796.patch
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2091) Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it to app masters

2014-05-21 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-2091:


 Summary: Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it 
to app masters
 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha


Currently, the AM cannot programmatically determine if the task was killed due 
to using excessive memory. The NM kills it without passing this information in 
the container status back to the RM. So the AM cannot take any action here. The 
jira tracks adding this exit status and passing it from the NM to the RM and 
then the AM. In general, there may be other such actions taken by YARN that are 
currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   6   7   8   9   10   >