date:20130411

[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629839#comment-13629839
 ] 

Hadoop QA commented on YARN-441:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578368/YARN-441.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/725//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/725//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/725//console

This message is automatically generated.

> Clean up unused collection methods in various APIs
> --
>
> Key: YARN-441
> URL: https://issues.apache.org/jira/browse/YARN-441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
> YARN-441.4.patch
>
>
> There's a bunch of unused methods like getAskCount() and getAsk(index) in 
> AllocateRequest, and other interfaces. These should be removed.
> In YARN, found them in. MR will have it's own set.
> AllocateRequest
> StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629837#comment-13629837
 ] 

Xuan Gong commented on YARN-561:


org.apache.hadoop.yarn.api.records.container has containerId and NodeId(which 
can get address and port) which are enough for container talked to its local 
NM. And by YARN-486, we have already add 
org.apache.hadoop.yarn.api.records.container to ContainImpl. So, it will get 
those information now.

> Nodemanager should set some key information into the environment of every 
> container that it launches.
> -
>
> Key: YARN-561
> URL: https://issues.apache.org/jira/browse/YARN-561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>  Labels: usability
>
> Information such as containerId, nodemanager hostname, nodemanager port is 
> not set in the environment when any container is launched. 
> For an AM, the RM does all of this for it but for a container launched by an 
> application, all of the above need to be set by the ApplicationMaster. 
> At the minimum, container id would be a useful piece of information. If the 
> container wishes to talk to its local NM, the nodemanager related information 
> would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629827#comment-13629827
 ] 

Xuan Gong commented on YARN-457:


Also need add this.updatedNodes.clear() before we actually add all the 
updatedNodes

> Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
> 
>
> Key: YARN-457
> URL: https://issues.apache.org/jira/browse/YARN-457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
>Priority: Minor
>  Labels: Newbie
> Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457.patch
>
>
> {code}
> if (updatedNodes == null) {
>   this.updatedNodes.clear();
>   return;
> }
> {code}
> If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-11 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-561:
--

Assignee: Xuan Gong  (was: Omkar Vinit Joshi)

> Nodemanager should set some key information into the environment of every 
> container that it launches.
> -
>
> Key: YARN-561
> URL: https://issues.apache.org/jira/browse/YARN-561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>  Labels: usability
>
> Information such as containerId, nodemanager hostname, nodemanager port is 
> not set in the environment when any container is launched. 
> For an AM, the RM does all of this for it but for a container launched by an 
> application, all of the above need to be set by the ApplicationMaster. 
> At the minimum, container id would be a useful piece of information. If the 
> container wishes to talk to its local NM, the nodemanager related information 
> would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-441:
---

Attachment: YARN-441.4.patch

create new patch based on the self-review comments on patch3

> Clean up unused collection methods in various APIs
> --
>
> Key: YARN-441
> URL: https://issues.apache.org/jira/browse/YARN-441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
> YARN-441.4.patch
>
>
> There's a bunch of unused methods like getAskCount() and getAsk(index) in 
> AllocateRequest, and other interfaces. These should be removed.
> In YARN, found them in. MR will have it's own set.
> AllocateRequest
> StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629819#comment-13629819
 ] 

Xuan Gong commented on YARN-441:


Patch3 self Review:
1. For each record API, we should only have getter and setter. We can keep 
getter and setter which get or take the whole list
2. For the functions which get, set, remove one item from the whole list or 
addAll, removeAll, clear the whole list, we can simply get the whole list 
first, then do the following get, set, remove or clear actions. So, those 
functions can be removed.

> Clean up unused collection methods in various APIs
> --
>
> Key: YARN-441
> URL: https://issues.apache.org/jira/browse/YARN-441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch
>
>
> There's a bunch of unused methods like getAskCount() and getAsk(index) in 
> AllocateRequest, and other interfaces. These should be removed.
> In YARN, found them in. MR will have it's own set.
> AllocateRequest
> StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629818#comment-13629818
 ] 

Hadoop QA commented on YARN-514:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578361/YARN-514.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/724//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/724//console

This message is automatically generated.

> Delayed store operations should not result in RM unavailability for app 
> submission
> --
>
> Key: YARN-514
> URL: https://issues.apache.org/jira/browse/YARN-514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Zhijie Shen
> Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
> YARN-514.4.patch
>
>
> Currently, app submission is the only store operation performed synchronously 
> because the app must be stored before the request returns with success. This 
> makes the RM susceptible to blocking all client threads on slow store 
> operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629806#comment-13629806
 ] 

Carlo Curino commented on YARN-45:
--

Note: we don't have tests as there are no tests for the rest of the 
protocolbuffer messages either (this would consist in validating mostly 
auto-generated code).  

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.4.patch

Fix the incorrect indents.

> Delayed store operations should not result in RM unavailability for app 
> submission
> --
>
> Key: YARN-514
> URL: https://issues.apache.org/jira/browse/YARN-514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Zhijie Shen
> Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
> YARN-514.4.patch
>
>
> Currently, app submission is the only store operation performed synchronously 
> because the app must be stored before the request returns with success. This 
> makes the RM susceptible to blocking all client threads on slow store 
> operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629782#comment-13629782
 ] 

Zhijie Shen commented on YARN-514:
--

@Biksa, the enum values in the proto needs to be changed because 
YarnApplicationStateProto will be used by application report. MR may also need 
it when doing state conversion from Yarn state to MR state.

> Delayed store operations should not result in RM unavailability for app 
> submission
> --
>
> Key: YARN-514
> URL: https://issues.apache.org/jira/browse/YARN-514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Zhijie Shen
> Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch
>
>
> Currently, app submission is the only store operation performed synchronously 
> because the app must be stored before the request returns with success. This 
> makes the RM susceptible to blocking all client threads on slow store 
> operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629745#comment-13629745
 ] 

Bikas Saha commented on YARN-514:
-

For MAPREDUCE-5140 please check for uses of both NEW and SUBMITTED in order to 
find out places where NEW_SAVING would need to be handled.

> Delayed store operations should not result in RM unavailability for app 
> submission
> --
>
> Key: YARN-514
> URL: https://issues.apache.org/jira/browse/YARN-514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Zhijie Shen
> Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch
>
>
> Currently, app submission is the only store operation performed synchronously 
> because the app must be stored before the request returns with success. This 
> makes the RM susceptible to blocking all client threads on slow store 
> operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629742#comment-13629742
 ] 

Bikas Saha commented on YARN-514:
-

Looks good overall. Minor tab issues in the patch.

I dont think we want to change the enum values in the proto.

Please prepare a MAPREDUCE side patch for MAPREDUCE-5140. These need to go in 
together.

> Delayed store operations should not result in RM unavailability for app 
> submission
> --
>
> Key: YARN-514
> URL: https://issues.apache.org/jira/browse/YARN-514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Zhijie Shen
> Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch
>
>
> Currently, app submission is the only store operation performed synchronously 
> because the app must be stored before the request returns with success. This 
> makes the RM susceptible to blocking all client threads on slow store 
> operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-482) FS: Extend SchedulingMode to intermediate queues

2013-04-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-482:
--

Attachment: yarn-482.patch

Here is a preliminary patch that
# Renames SchedulingMode to SchedulingPolicy, as policy seems to be more apt 
name
# Extends setting SchedulingPolicy to intermediate queues
# Fixes previously broken assignContainer() hierarchy to include intermediate 
queues

> FS: Extend SchedulingMode to intermediate queues
> 
>
> Key: YARN-482
> URL: https://issues.apache.org/jira/browse/YARN-482
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-482.patch
>
>
> FS allows setting {{SchedulingMode}} for leaf queues. Extending this to 
> non-leaf queues allows using different kinds of fairness: e.g., root can have 
> three child queues - fair-mem, drf-cpu-mem, drf-cpu-disk-mem taking different 
> number of resources into account. In turn, this allows users to decide on the 
> scheduling latency vs sophistication of the scheduling mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629707#comment-13629707
 ] 

Hadoop QA commented on YARN-45:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578339/YARN-45.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/723//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/723//console

This message is automatically generated.

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: YARN-45.patch

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: (was: YARN-45.patch)

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: YARN-45.patch

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629691#comment-13629691
 ] 

Hadoop QA commented on YARN-45:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578337/YARN-45.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/722//console

This message is automatically generated.

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: (was: YARN-45.patch)

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: YARN-45.patch

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629662#comment-13629662
 ] 

Bikas Saha commented on YARN-45:


Moved to sub-task of YARN-397 for scheduler API changes.

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-45:
---

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-45:
---

Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-386)

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629660#comment-13629660
 ] 

Carlo Curino commented on YARN-45:
--

[~kkambatl], yes ResourceRequests can be used to capture locality preferences. 
In our first use we focus on capacity, so the RM policies are not very 
picky/aware of location, but we think it is good to build this into the 
protocol for later use (as commented above somewhere). 

(As for the last comment: we moved YARN-567, YARN-568, YARN-569 that will use 
this protocol into YARN-397, while this one is probably part of YARN-386).

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

> FairScheduler: support for work-preserving preemption 
> --
>
> Key: YARN-568
> URL: https://issues.apache.org/jira/browse/YARN-568
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: fair.patch
>
>
> In the attached patch, we modified  the FairScheduler to substitute its 
> preemption-by-killling with a work-preserving version of preemption (followed 
> by killing if the AMs do not respond quickly enough). This should allows to 
> run preemption checking more often, but kill less often (proper tuning to be 
> investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

> CapacityScheduler: support for preemption (using a capacity monitor)
> 
>
> Key: YARN-569
> URL: https://issues.apache.org/jira/browse/YARN-569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: 3queues.pdf, capacity.patch, 
> CapScheduler_with_preemption.pdf
>
>
> There is a tension between the fast-pace reactive role of the 
> CapacityScheduler, which needs to respond quickly to 
> applications resource requests, and node updates, and the more introspective, 
> time-based considerations 
> needed to observe and correct for capacity balance. To this purpose we opted 
> instead of hacking the delicate
> mechanisms of the CapacityScheduler directly to add support for preemption by 
> means of a "Capacity Monitor",
> which can be run optionally as a separate service (much like the 
> NMLivelinessMonitor).
> The capacity monitor (similarly to equivalent functionalities in the fairness 
> scheduler) operates running on intervals 
> (e.g., every 3 seconds), observe the state of the assignment of resources to 
> queues from the capacity scheduler, 
> performs off-line computation to determine if preemption is needed, and how 
> best to "edit" the current schedule to 
> improve capacity, and generates events that produce four possible actions:
> # Container de-reservations
> # Resource-based preemptions
> # Container-based preemptions
> # Container killing
> The actions listed above are progressively more costly, and it is up to the 
> policy to use them as desired to achieve the rebalancing goals. 
> Note that due to the "lag" in the effect of these actions the policy should 
> operate at the macroscopic level (e.g., preempt tens of containers
> from a queue) and not trying to tightly and consistently micromanage 
> container allocations. 
> - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
> - 
> Preemption policies are by design pluggable, in the following we present an 
> initial policy (ProportionalCapacityPreemptionPolicy) we have been 
> experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
> follows:
> # it gathers from the scheduler the state of the queues, in particular, their 
> current capacity, guaranteed capacity and pending requests (*)
> # if there are pending requests from queues that are under capacity it 
> computes a new ideal balanced state (**)
> # it computes the set of preemptions needed to repair the current schedule 
> and achieve capacity balance (accounting for natural completion rates, and 
> respecting bounds on the amount of preemption we allow for each round)
> # it selects which applications to preempt from each over-capacity queue (the 
> last one in the FIFO order)
> # it remove reservations from the most recently assigned app until the amount 
> of resource to reclaim is obtained, or until no more reservations exits
> # (if not enough) it issues preemptions for containers from the same 
> applications (reverse chronological order, last assigned container first) 
> again until necessary or until no containers except the AM container are left,
> # (if not enough) it moves onto unreserve and preempt from the next 
> application. 
> # containers that have been asked to preempt are tracked across executions. 
> If a containers is among the one to be preempted for more than a certain 
> time, the container is moved in a the list of containers to be forcibly 
> killed. 
> Notes:
> (*) at the moment, in order to avoid double-counting of the requests, we only 
> look at the "ANY" part of pending resource requests, which means we might not 
> preempt on behalf of AMs that ask only for specific locations but not any. 
> (**) The ideal balance state is one in which each queue has at least its 
> guaranteed capacity, and the spare capacity is distributed among queues (that 
> wants some) as a weighted fair share. Where the weighting is based on the 
> guaranteed capacity of a queue, and the function runs to a fix point.  
> Tunables of the ProportionalCapacityPreemptionPolicy:
> # observe-only mode (i.e., log the actions it would take, but behave as 
> read-only)
> # how frequently to run the policy
> # how long to wait between preemption and kill of a container
> # which fraction of the containers I would like to obtain should I preempt 
> (has to do with the natural rate at which containers are returned)
> # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
> perfect balance by some small % we i

[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-567:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

> RM changes to support preemption for FairScheduler and CapacityScheduler
> 
>
> Key: YARN-567
> URL: https://issues.apache.org/jira/browse/YARN-567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: common.patch
>
>
> A common tradeoff in scheduling jobs is between keeping the cluster busy and 
> enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
> takes opposite stance on how to achieve this. 
> The FairScheduler, leverages task-killing to quickly reclaim resources from 
> currently running jobs and redistributing them among new jobs, thus keeping 
> the cluster busy but waste useful work. The CapacityScheduler is typically 
> tuned
> to limit the portion of the cluster used by each queue so that the likelihood 
> of violating capacity is low, thus never wasting work, but risking to keep 
> the cluster underutilized or have jobs waiting to obtain their rightful 
> capacity. 
> By introducing the notion of a work-preserving preemption we can remove this 
> tradeoff.  This requires a protocol for preemption (YARN-45), and 
> ApplicationMasters that can answer to preemption  efficiently (e.g., by 
> saving their intermediate state, this will be posted for MapReduce in a 
> separate JIRA soon), together with a scheduler that can issues preemption 
> requests (discussed in separate JIRAs YARN-568 and YARN-569).
> The changes we track with this JIRA are common to FairScheduler and 
> CapacityScheduler, and are mostly propagation of preemption decisions through 
> the ApplicationMastersService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629638#comment-13629638
 ] 

Karthik Kambatla commented on YARN-45:
--

[~bikassaha], shouldn't this be under YARN-397?

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629635#comment-13629635
 ] 

Karthik Kambatla commented on YARN-45:
--

Great discussion, glad to see this coming along well. Carlo's latest comment 
makes sense to me.

Let me know if I understand it right: ResourceRequest part of the message can 
capture locality, the AM will try to give back Resources on each node as per 
this locality information?

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629620#comment-13629620
 ] 

Bikas Saha commented on YARN-45:


All API changes at this point are being tracked under YARN-386

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-45:
---

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-386

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629590#comment-13629590
 ] 

Hadoop QA commented on YARN-547:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578317/yarn-547-20130411.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/721//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/721//console

This message is automatically generated.

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: (was: yarn-547-20130411.1.patch)

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130411.1.patch

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130411.1.patch

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629569#comment-13629569
 ] 

Omkar Vinit Joshi commented on YARN-547:


Failed test is actually testing Now invalid transitions. Fixing it.

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629549#comment-13629549
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-trunk-Commit #3603 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3603/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Submit a job to a queue that not allowed in fairScheduler, client will hold 
> forever.
> 
>
> Key: YARN-319
> URL: https://issues.apache.org/jira/browse/YARN-319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: shenhong
>Assignee: shenhong
> Fix For: 2.0.5-beta
>
> Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
> YARN-319.patch
>
>
> RM use fairScheduler, when client submit a job to a queue, but the queue do 
> not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629514#comment-13629514
 ] 

Hadoop QA commented on YARN-486:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578308/YARN-486.6.branch2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/720//console

This message is automatically generated.

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
> YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
> YARN-486.6.branch2.patch, YARN-486.6.patch
>
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-486:
---

Attachment: YARN-486.6.branch2.patch

Patch for branch-2

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
> YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
> YARN-486.6.branch2.patch, YARN-486.6.patch
>
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629510#comment-13629510
 ] 

Hadoop QA commented on YARN-547:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578295/yarn-547-20130411.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalizedResource

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/719//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/719//console

This message is automatically generated.

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>     Attachments: yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-544) Failed resource localization might introduce a race condition.

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-544.
--

Resolution: Duplicate

Thanks for the update, Omkar.

Closing this as duplicate.

> Failed resource localization might introduce a race condition.
> --
>
> Key: YARN-544
> URL: https://issues.apache.org/jira/browse/YARN-544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> When resource localization fails [Public localizer / 
> LocalizerRunner(Private)] it sends ContainerResourceFailedEvent to the 
> containers which then sends ResourceReleaseEvent to the failed resource. In 
> the end when LocalizedResource's ref count drops to 0 its state is changed 
> from DOWNLOADING to INIT.
> Now if a Resource gets ResourceRequestEvent in between 
> ContainerResourceFailedEvent and last ResourceReleaseEvent then for that 
> resource ref count will not drop to 0 and the container which sent the 
> ResourceRequestEvent will keep waiting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-537) Waiting containers are not informed if private localization for a resource fails.

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-537.
--

Resolution: Duplicate

Fixed as part of YARN-539. Closing as duplicate.

> Waiting containers are not informed if private localization for a resource 
> fails.
> -
>
> Key: YARN-537
> URL: https://issues.apache.org/jira/browse/YARN-537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> In ResourceLocalizationService.LocalizerRunner.update() if localization fails 
> then all the other waiting containers are not informed only the initiator is 
> informed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629480#comment-13629480
 ] 

Hadoop QA commented on YARN-542:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578289/YARN-542.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/718//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/718//console

This message is automatically generated.

> Change the default global AM max-attempts value to be not one
> -
>
> Key: YARN-542
> URL: https://issues.apache.org/jira/browse/YARN-542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-542.1.patch
>
>
> Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
> max-attempts accounts for both AM level failures as well as container crashes 
> due to localization issue, lost nodes etc. To account for AM crashes due to 
> problems that are not caused by user code, mainly lost nodes, we want to give 
> AMs some retires.
> I propose we change it to atleast two. Can change it to 4 to match other 
> retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629476#comment-13629476
 ] 

Vinod Kumar Vavilapalli commented on YARN-486:
--

I merged YARN-319 into branch-2. But YARN-488 won't be merged yet because it is 
a WINDOWS only change, so can you upload a patch for branch-2? Tx.

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
> YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
> YARN-486.6.patch
>
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629475#comment-13629475
 ] 

Xuan Gong commented on YARN-486:


Another issue is at YARN-488 which is not committed into branch-2, either.
It do the changes 

 ContainerLaunchContext amContainer = BuilderUtils
 .newContainerLaunchContext(null, "testUser", BuilderUtils
 .newResource(1024, 1), Collections.emptyMap(),
-new HashMap(), Arrays.asList("sleep", "100"),
+new HashMap(), cmd,
 new HashMap(), null,
 new HashMap());

At TestContainerManagerSecurity:submitAndRegisterApplication
 

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
> YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
> YARN-486.6.patch
>
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130411.patch

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>     Attachments: yarn-547-20130411.patch
>
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629467#comment-13629467
 ] 

Omkar Vinit Joshi commented on YARN-547:


Fix details :-
* Underlying problem:- Resource was getting requested even when it is in 
DOWNLOADING state for ResourceRequestEvent.
* Solution :- Fixing unwanted transition and for RequestEvent in DOWNLOADING 
state just adds container in the waiting queue.
* Tests :- Making sure that resource never moves back to INIT state even when 
requesting container releases it before localization. In case of Release event 
when resource is in DOWNLOADING state just updates container list(ref).

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-319:
-

Fix Version/s: (was: 2.0.3-alpha)
   2.0.5-beta

Even though the fix version is set to 2.0.3, it isn't merged into branch-2 at 
all. I just merged it into 2.0.5-beta, and changing the fix version.

> Submit a job to a queue that not allowed in fairScheduler, client will hold 
> forever.
> 
>
> Key: YARN-319
> URL: https://issues.apache.org/jira/browse/YARN-319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: shenhong
>Assignee: shenhong
> Fix For: 2.0.5-beta
>
> Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
> YARN-319.patch
>
>
> RM use fairScheduler, when client submit a job to a queue, but the queue do 
> not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Attachment: YARN-542.1.patch

I've drafted a patch, which includes the following modifications:

1. Change the default value of yarn.resourcemanager.am.max-attempts from 1 to 2.

2. In the test cases, where more than one attempt is set, 
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS is used instead of the hard-coding 
values.

3. Assert the set maxAttempts > 1 where one and more than one will make 
difference.

> Change the default global AM max-attempts value to be not one
> -
>
> Key: YARN-542
> URL: https://issues.apache.org/jira/browse/YARN-542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-542.1.patch
>
>
> Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
> max-attempts accounts for both AM level failures as well as container crashes 
> due to localization issue, lost nodes etc. To account for AM crashes due to 
> problems that are not caused by user code, mainly lost nodes, we want to give 
> AMs some retires.
> I propose we change it to atleast two. Can change it to 4 to match other 
> retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629414#comment-13629414
 ] 

Hadoop QA commented on YARN-441:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578280/YARN-441.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/717//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/717//console

This message is automatically generated.

> Clean up unused collection methods in various APIs
> --
>
> Key: YARN-441
> URL: https://issues.apache.org/jira/browse/YARN-441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch
>
>
> There's a bunch of unused methods like getAskCount() and getAsk(index) in 
> AllocateRequest, and other interfaces. These should be removed.
> In YARN, found them in. MR will have it's own set.
> AllocateRequest
> StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629377#comment-13629377
 ] 

Xuan Gong commented on YARN-441:


Add the void setServiceResponse(String key, ByteBuffer value) back to 
StartContainerResponse interface.

> Clean up unused collection methods in various APIs
> --
>
> Key: YARN-441
> URL: https://issues.apache.org/jira/browse/YARN-441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch
>
>
> There's a bunch of unused methods like getAskCount() and getAsk(index) in 
> AllocateRequest, and other interfaces. These should be removed.
> In YARN, found them in. MR will have it's own set.
> AllocateRequest
> StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-441:
---

Attachment: YARN-441.3.patch

> Clean up unused collection methods in various APIs
> --
>
> Key: YARN-441
> URL: https://issues.apache.org/jira/browse/YARN-441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch
>
>
> There's a bunch of unused methods like getAskCount() and getAsk(index) in 
> AllocateRequest, and other interfaces. These should be removed.
> In YARN, found them in. MR will have it's own set.
> AllocateRequest
> StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629335#comment-13629335
 ] 

Xuan Gong commented on YARN-486:


Can not merge into branch-2, because There is no such test case 
TestFairScheduler:testNotAllowSubmitApplication in branch which is introduced 
by YARN-319, and look like that this patch is not submitted to branch-2

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
> YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
> YARN-486.6.patch
>
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Description: 
Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
max-attempts accounts for both AM level failures as well as container crashes 
due to localization issue, lost nodes etc. To account for AM crashes due to 
problems that are not caused by user code, mainly lost nodes, we want to give 
AMs some retires.

I propose we change it to atleast two. Can change it to 4 to match other 
retry-configs.

  was:
Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries 
accounts for both AM level failures as well as container crashes due to 
localization issue, lost nodes etc. To account for AM crashes due to problems 
that are not caused by user code, mainly lost nodes, we want to give AMs some 
retires.

I propose we change it to atleast two. Can change it to 4 to match other 
retry-configs.


> Change the default global AM max-attempts value to be not one
> -
>
> Key: YARN-542
> URL: https://issues.apache.org/jira/browse/YARN-542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
> max-attempts accounts for both AM level failures as well as container crashes 
> due to localization issue, lost nodes etc. To account for AM crashes due to 
> problems that are not caused by user code, mainly lost nodes, we want to give 
> AMs some retires.
> I propose we change it to atleast two. Can change it to 4 to match other 
> retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Summary: Change the default global AM max-attempts value to be not one  
(was: Change the default AM retry value to be not one)

> Change the default global AM max-attempts value to be not one
> -
>
> Key: YARN-542
> URL: https://issues.apache.org/jira/browse/YARN-542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries 
> accounts for both AM level failures as well as container crashes due to 
> localization issue, lost nodes etc. To account for AM crashes due to problems 
> that are not caused by user code, mainly lost nodes, we want to give AMs some 
> retires.
> I propose we change it to atleast two. Can change it to 4 to match other 
> retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-563) Add application type to ApplicationReport

2013-04-11 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629204#comment-13629204
 ] 

Hitesh Shah commented on YARN-563:
--

+1 on the suggestion. If you are working on this, a few comments: 

  - applicationType should also be part of ApplicationSubmissionContext
  - command-line tool to list applications (bin/yarn tool) should support 
filtering based on type
  - type should be a string

> Add application type to ApplicationReport 
> --
>
> Key: YARN-563
> URL: https://issues.apache.org/jira/browse/YARN-563
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>
> This field is needed to distinguish different types of applications (app 
> master implementations). For example, we may run applications of type XYZ in 
> a cluster alongside MR and would like to filter applications by type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629265#comment-13629265
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-trunk-Commit #3596 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3596/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.j

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629256#comment-13629256
 ] 

Vinod Kumar Vavilapalli commented on YARN-486:
--

I committed this to trunk, it isn't merging into branch-2 though, can you 
please check?

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
> Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
> YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
> YARN-486.6.patch
>
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629231#comment-13629231
 ] 

Xuan Gong commented on YARN-457:


First of all, I think the changes will be AllocationResponsePBImpl, there is no 
AMResponsePBImpl anymore. Could you update to the lastest trunk version, please 
?
I think we need to change the whole setUpdatedNodes function definition, Only 
changing the if block is not enough. The whole change may like this way:
if(updatedNodes == null) {
   return
}
initLocalNewNodeReportList();
this.updatedNodes.add(updatedNodes);

The way we implement the setUpdatedNodes is just like we are implementing the 
setAllocatedContainers() in AllocationResponsePBImpl



> Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
> 
>
> Key: YARN-457
> URL: https://issues.apache.org/jira/browse/YARN-457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
>Priority: Minor
>  Labels: Newbie
> Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457.patch
>
>
> {code}
> if (updatedNodes == null) {
>   this.updatedNodes.clear();
>   return;
> }
> {code}
> If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-559) Make all YARN API and libraries available through an api jar

2013-04-11 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned YARN-559:
--

Assignee: Vinod Kumar Vavilapalli

> Make all YARN API and libraries available through an api jar
> 
>
> Key: YARN-559
> URL: https://issues.apache.org/jira/browse/YARN-559
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Vinod Kumar Vavilapalli
>
> This should be the dependency for interacting with YARN and would prevent 
> unnecessary leakage of other internal stuff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629147#comment-13629147
 ] 

Omkar Vinit Joshi commented on YARN-547:


There are couple of invalid transition for LocalizedResource now. Updating them 
as a part of this patch
* From INIT state
** From INIT to INIT on RELEASE event. This is not possible now as new resource 
is created in INIT state on REQUEST event and immediately moved to DOWNLOADING 
state. With the [yarn-539|https://issues.apache.org/jira/browse/YARN-539] fix 
now the resource will never ever move back from LOCALIZED or DOWNLOADING state 
to INIT state.
** From INIT to LOCALIZED on LOCALIZED event. This too is impossible to occur 
now.
* From DOWNLOADING state
** From DOWNLOADING to DOWNLOADING on REQUEST event. Updating the transition. 
Earlier it was starting one more localization. Now just adding the requesting 
container to the LocalizedResource container list.
* From LOCALIZED state
** Resource will never get LOCALIZED event in LOCALIZED state. removing it. 
Earlier this was possible as there were multiple downloads for the same 
resource. Now this is not possible.

> New resource localization is tried even when Localized Resource is in 
> DOWNLOADING state
> ---
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> At present when multiple containers try to request a localized resource 
> 1) If the resource is not present then first it is created and Resource 
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> 2) Now if in this state multiple ResourceRequestEvents come in then 
> ResourceLocalizationEvents are fired for all of them.
> Most of the times it is not resulting into a duplicate resource download but 
> there is a race condition present there. 
> Location : ResourceLocalizationService.addResource .. addition of the request 
> into "attempts" in case of an event already exists.
> The root cause for this is the presence of FetchResourceTransition on 
> receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-563) Add application type to ApplicationReport

2013-04-11 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-563:


Issue Type: Sub-task  (was: Improvement)
Parent: YARN-386

> Add application type to ApplicationReport 
> --
>
> Key: YARN-563
> URL: https://issues.apache.org/jira/browse/YARN-563
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>
> This field is needed to distinguish different types of applications (app 
> master implementations). For example, we may run applications of type XYZ in 
> a cluster alongside MR and would like to filter applications by type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629070#comment-13629070
 ] 

Alejandro Abdelnur commented on YARN-45:


sounds good

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628950#comment-13628950
 ] 

Carlo Curino commented on YARN-45:
--

Agreed on a single message, where the semantics is:
1) if both Set and ResourceRequest are specified, than it is what 
said (they overlap and you have to give me back at least the resources I ask 
otherwise these containers are at risk to getting killed)
2) if only Set is specified is the "stricter" semantics of I want 
these containers back and nothing else.
3) if only ResourceRequest is specified the semantics is "please give me back 
this many resources" without binding what containers are at risk (this might be 
good for policies that do not want to think about containers unless it is 
really time to kill them).

Does this work for you? Seems to capture the combination of what we proposed so 
far.

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628945#comment-13628945
 ] 

Hudson commented on YARN-539:
-

Integrated in Hadoop-Mapreduce-trunk #1396 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/])
YARN-539. Addressed memory leak of LocalResource objects NM when a resource 
localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466756
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


> LocalizedResources are leaked in memory in case resource localization fails
> ---
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.0.5-beta
>
> Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, 
> yarn-539-20130410.patch
>
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space 
> crunch. (If sufficient space in cache is available then it will remain in 
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to 
> LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628949#comment-13628949
 ] 

Hudson commented on YARN-487:
-

Integrated in Hadoop-Mapreduce-trunk #1396 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/])
YARN-487. Modify path manipulation in LocalDirsHandlerService to let 
TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 
1466746)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466746
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java


> TestDiskFailures fails on Windows due to path mishandling
> -
>
> Key: YARN-487
> URL: https://issues.apache.org/jira/browse/YARN-487
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0
>
> Attachments: YARN-487.1.patch
>
>
> {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
> extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
> on Windows.  The test assertions also fail to account for the fact that 
> {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628944#comment-13628944
 ] 

Hudson commented on YARN-495:
-

Integrated in Hadoop-Mapreduce-trunk #1396 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/])
YARN-495. Changed NM reboot behaviour to be a simple resync - kill all 
containers  and re-register with RM. Contributed by Jian He. (Revision 1466752)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466752
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java


> Change NM behavior of reboot to resync
> --
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, 
> YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Assignee: Carlo Curino

> FairScheduler: support for work-preserving preemption 
> --
>
> Key: YARN-568
> URL: https://issues.apache.org/jira/browse/YARN-568
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: fair.patch
>
>
> In the attached patch, we modified  the FairScheduler to substitute its 
> preemption-by-killling with a work-preserving version of preemption (followed 
> by killing if the AMs do not respond quickly enough). This should allows to 
> run preemption checking more often, but kill less often (proper tuning to be 
> investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Assignee: Carlo Curino

> CapacityScheduler: support for preemption (using a capacity monitor)
> 
>
> Key: YARN-569
> URL: https://issues.apache.org/jira/browse/YARN-569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: 3queues.pdf, capacity.patch, 
> CapScheduler_with_preemption.pdf
>
>
> There is a tension between the fast-pace reactive role of the 
> CapacityScheduler, which needs to respond quickly to 
> applications resource requests, and node updates, and the more introspective, 
> time-based considerations 
> needed to observe and correct for capacity balance. To this purpose we opted 
> instead of hacking the delicate
> mechanisms of the CapacityScheduler directly to add support for preemption by 
> means of a "Capacity Monitor",
> which can be run optionally as a separate service (much like the 
> NMLivelinessMonitor).
> The capacity monitor (similarly to equivalent functionalities in the fairness 
> scheduler) operates running on intervals 
> (e.g., every 3 seconds), observe the state of the assignment of resources to 
> queues from the capacity scheduler, 
> performs off-line computation to determine if preemption is needed, and how 
> best to "edit" the current schedule to 
> improve capacity, and generates events that produce four possible actions:
> # Container de-reservations
> # Resource-based preemptions
> # Container-based preemptions
> # Container killing
> The actions listed above are progressively more costly, and it is up to the 
> policy to use them as desired to achieve the rebalancing goals. 
> Note that due to the "lag" in the effect of these actions the policy should 
> operate at the macroscopic level (e.g., preempt tens of containers
> from a queue) and not trying to tightly and consistently micromanage 
> container allocations. 
> - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
> - 
> Preemption policies are by design pluggable, in the following we present an 
> initial policy (ProportionalCapacityPreemptionPolicy) we have been 
> experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
> follows:
> # it gathers from the scheduler the state of the queues, in particular, their 
> current capacity, guaranteed capacity and pending requests (*)
> # if there are pending requests from queues that are under capacity it 
> computes a new ideal balanced state (**)
> # it computes the set of preemptions needed to repair the current schedule 
> and achieve capacity balance (accounting for natural completion rates, and 
> respecting bounds on the amount of preemption we allow for each round)
> # it selects which applications to preempt from each over-capacity queue (the 
> last one in the FIFO order)
> # it remove reservations from the most recently assigned app until the amount 
> of resource to reclaim is obtained, or until no more reservations exits
> # (if not enough) it issues preemptions for containers from the same 
> applications (reverse chronological order, last assigned container first) 
> again until necessary or until no containers except the AM container are left,
> # (if not enough) it moves onto unreserve and preempt from the next 
> application. 
> # containers that have been asked to preempt are tracked across executions. 
> If a containers is among the one to be preempted for more than a certain 
> time, the container is moved in a the list of containers to be forcibly 
> killed. 
> Notes:
> (*) at the moment, in order to avoid double-counting of the requests, we only 
> look at the "ANY" part of pending resource requests, which means we might not 
> preempt on behalf of AMs that ask only for specific locations but not any. 
> (**) The ideal balance state is one in which each queue has at least its 
> guaranteed capacity, and the spare capacity is distributed among queues (that 
> wants some) as a weighted fair share. Where the weighting is based on the 
> guaranteed capacity of a queue, and the function runs to a fix point.  
> Tunables of the ProportionalCapacityPreemptionPolicy:
> # observe-only mode (i.e., log the actions it would take, but behave as 
> read-only)
> # how frequently to run the policy
> # how long to wait between preemption and kill of a container
> # which fraction of the containers I would like to obtain should I preempt 
> (has to do with the natural rate at which containers are returned)
> # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
> perfect balance by some small % we ignore it)
> # overall amount of preempti

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628938#comment-13628938
 ] 

Alejandro Abdelnur commented on YARN-45:


I'm just trying to see if we can have (at least for now) a single message type 
instead of two that satisfies the usecases. Regarding keeping the tighter 
semantics, if not difficult/complex, I'm OK with it. Thanks.

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628937#comment-13628937
 ] 

Carlo Curino commented on YARN-569:
---

- Comments of attached Graphs --
The attached graph highlights the need for preemption by means of an example 
designed to highlights this. We run 2 sort jobs over 128GB of data on a 10 
nodes cluster, starting the first job in queue B (20% guaranteed capacity) and 
the second job 400sec later in queue A (80% guaranteed capacity).

We compare three scenarios:
# Default CapacityScheduler with A and B having maximum capacity set to 100%: 
the cluster utilization is high, B runs fast since it can use the entire 
cluster when A is not around, but A needs to wait for very long (almost 20 min) 
before obtaining access to its all of its guaranteed capacity (and over 250 
secs to get any container beside the AM).
# Default CapacityScheduler with A and B have maximum capacity set to 80 and 
20% respectively, A obtains its guaranteed resources immediately, but the 
cluster utilization is very low and jobs in B take over 2X longer since they 
cannot use spare overcapacity.
# CapacityScheduler + preemption: A and B are configured as in 1) but we 
preempt containers. We obtain both high-utilization, short runtimes for B 
(comparable to scenario 1), and prompt resources to A (within 30 sec). 

The second attached graph shows a scenario with 3 queues A, B, C with 40%, 20%, 
40% capacity guaranteed. We show more "internals" of the policy by plotting, 
instantaneous resource utilization as above, total pending request, guaranteed 
capacity, ideal assignment of memory, ideal preemption, actual preemption.
 
Things to note:
# The idealized memory assignment and instaneous resource utilization are very 
close to each other, i.e., the combination of CapacityScheduler+Preemption 
tightly follows the the ideal distribution of resources
# When only one job is running it gets 100% of the cluster, when B, A are 
running they get 33% and 66% each (which is a fair overcapacity assignment from 
their 20%, 40% guaranteed capacity), when all three jobs are running (and they 
want at least their capacity worth of resources) they obtain their guaranteed 
capacity.
#actual preemption is a fraction of ideal preemption, this is because we 
account for natural completion of tasks (with a configurable parameter)
#in this experiment we do not bound the total amount of preemption per round 
(i.e., parameter set to 1.0)
 




> CapacityScheduler: support for preemption (using a capacity monitor)
> 
>
> Key: YARN-569
> URL: https://issues.apache.org/jira/browse/YARN-569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Carlo Curino
> Attachments: 3queues.pdf, capacity.patch, 
> CapScheduler_with_preemption.pdf
>
>
> There is a tension between the fast-pace reactive role of the 
> CapacityScheduler, which needs to respond quickly to 
> applications resource requests, and node updates, and the more introspective, 
> time-based considerations 
> needed to observe and correct for capacity balance. To this purpose we opted 
> instead of hacking the delicate
> mechanisms of the CapacityScheduler directly to add support for preemption by 
> means of a "Capacity Monitor",
> which can be run optionally as a separate service (much like the 
> NMLivelinessMonitor).
> The capacity monitor (similarly to equivalent functionalities in the fairness 
> scheduler) operates running on intervals 
> (e.g., every 3 seconds), observe the state of the assignment of resources to 
> queues from the capacity scheduler, 
> performs off-line computation to determine if preemption is needed, and how 
> best to "edit" the current schedule to 
> improve capacity, and generates events that produce four possible actions:
> # Container de-reservations
> # Resource-based preemptions
> # Container-based preemptions
> # Container killing
> The actions listed above are progressively more costly, and it is up to the 
> policy to use them as desired to achieve the rebalancing goals. 
> Note that due to the "lag" in the effect of these actions the policy should 
> operate at the macroscopic level (e.g., preempt tens of containers
> from a queue) and not trying to tightly and consistently micromanage 
> container allocations. 
> - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
> - 
> Preemption policies are by design pluggable, in the following we present an 
> initial policy (ProportionalCapacityPreemptionPolicy) we have been 
> experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
> follows:
> # it gathers from the scheduler the state of the queues, in particular, th

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Attachment: capacity.patch

> CapacityScheduler: support for preemption (using a capacity monitor)
> 
>
> Key: YARN-569
> URL: https://issues.apache.org/jira/browse/YARN-569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Carlo Curino
> Attachments: 3queues.pdf, capacity.patch, 
> CapScheduler_with_preemption.pdf
>
>
> There is a tension between the fast-pace reactive role of the 
> CapacityScheduler, which needs to respond quickly to 
> applications resource requests, and node updates, and the more introspective, 
> time-based considerations 
> needed to observe and correct for capacity balance. To this purpose we opted 
> instead of hacking the delicate
> mechanisms of the CapacityScheduler directly to add support for preemption by 
> means of a "Capacity Monitor",
> which can be run optionally as a separate service (much like the 
> NMLivelinessMonitor).
> The capacity monitor (similarly to equivalent functionalities in the fairness 
> scheduler) operates running on intervals 
> (e.g., every 3 seconds), observe the state of the assignment of resources to 
> queues from the capacity scheduler, 
> performs off-line computation to determine if preemption is needed, and how 
> best to "edit" the current schedule to 
> improve capacity, and generates events that produce four possible actions:
> # Container de-reservations
> # Resource-based preemptions
> # Container-based preemptions
> # Container killing
> The actions listed above are progressively more costly, and it is up to the 
> policy to use them as desired to achieve the rebalancing goals. 
> Note that due to the "lag" in the effect of these actions the policy should 
> operate at the macroscopic level (e.g., preempt tens of containers
> from a queue) and not trying to tightly and consistently micromanage 
> container allocations. 
> - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
> - 
> Preemption policies are by design pluggable, in the following we present an 
> initial policy (ProportionalCapacityPreemptionPolicy) we have been 
> experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
> follows:
> # it gathers from the scheduler the state of the queues, in particular, their 
> current capacity, guaranteed capacity and pending requests (*)
> # if there are pending requests from queues that are under capacity it 
> computes a new ideal balanced state (**)
> # it computes the set of preemptions needed to repair the current schedule 
> and achieve capacity balance (accounting for natural completion rates, and 
> respecting bounds on the amount of preemption we allow for each round)
> # it selects which applications to preempt from each over-capacity queue (the 
> last one in the FIFO order)
> # it remove reservations from the most recently assigned app until the amount 
> of resource to reclaim is obtained, or until no more reservations exits
> # (if not enough) it issues preemptions for containers from the same 
> applications (reverse chronological order, last assigned container first) 
> again until necessary or until no containers except the AM container are left,
> # (if not enough) it moves onto unreserve and preempt from the next 
> application. 
> # containers that have been asked to preempt are tracked across executions. 
> If a containers is among the one to be preempted for more than a certain 
> time, the container is moved in a the list of containers to be forcibly 
> killed. 
> Notes:
> (*) at the moment, in order to avoid double-counting of the requests, we only 
> look at the "ANY" part of pending resource requests, which means we might not 
> preempt on behalf of AMs that ask only for specific locations but not any. 
> (**) The ideal balance state is one in which each queue has at least its 
> guaranteed capacity, and the spare capacity is distributed among queues (that 
> wants some) as a weighted fair share. Where the weighting is based on the 
> guaranteed capacity of a queue, and the function runs to a fix point.  
> Tunables of the ProportionalCapacityPreemptionPolicy:
> # observe-only mode (i.e., log the actions it would take, but behave as 
> read-only)
> # how frequently to run the policy
> # how long to wait between preemption and kill of a container
> # which fraction of the containers I would like to obtain should I preempt 
> (has to do with the natural rate at which containers are returned)
> # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
> perfect balance by some small % we ignore it)
> # overall amount of preemption we can afford for each run of

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Attachment: 3queues.pdf
CapScheduler_with_preemption.pdf

> CapacityScheduler: support for preemption (using a capacity monitor)
> 
>
> Key: YARN-569
> URL: https://issues.apache.org/jira/browse/YARN-569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Carlo Curino
> Attachments: 3queues.pdf, capacity.patch, 
> CapScheduler_with_preemption.pdf
>
>
> There is a tension between the fast-pace reactive role of the 
> CapacityScheduler, which needs to respond quickly to 
> applications resource requests, and node updates, and the more introspective, 
> time-based considerations 
> needed to observe and correct for capacity balance. To this purpose we opted 
> instead of hacking the delicate
> mechanisms of the CapacityScheduler directly to add support for preemption by 
> means of a "Capacity Monitor",
> which can be run optionally as a separate service (much like the 
> NMLivelinessMonitor).
> The capacity monitor (similarly to equivalent functionalities in the fairness 
> scheduler) operates running on intervals 
> (e.g., every 3 seconds), observe the state of the assignment of resources to 
> queues from the capacity scheduler, 
> performs off-line computation to determine if preemption is needed, and how 
> best to "edit" the current schedule to 
> improve capacity, and generates events that produce four possible actions:
> # Container de-reservations
> # Resource-based preemptions
> # Container-based preemptions
> # Container killing
> The actions listed above are progressively more costly, and it is up to the 
> policy to use them as desired to achieve the rebalancing goals. 
> Note that due to the "lag" in the effect of these actions the policy should 
> operate at the macroscopic level (e.g., preempt tens of containers
> from a queue) and not trying to tightly and consistently micromanage 
> container allocations. 
> - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
> - 
> Preemption policies are by design pluggable, in the following we present an 
> initial policy (ProportionalCapacityPreemptionPolicy) we have been 
> experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
> follows:
> # it gathers from the scheduler the state of the queues, in particular, their 
> current capacity, guaranteed capacity and pending requests (*)
> # if there are pending requests from queues that are under capacity it 
> computes a new ideal balanced state (**)
> # it computes the set of preemptions needed to repair the current schedule 
> and achieve capacity balance (accounting for natural completion rates, and 
> respecting bounds on the amount of preemption we allow for each round)
> # it selects which applications to preempt from each over-capacity queue (the 
> last one in the FIFO order)
> # it remove reservations from the most recently assigned app until the amount 
> of resource to reclaim is obtained, or until no more reservations exits
> # (if not enough) it issues preemptions for containers from the same 
> applications (reverse chronological order, last assigned container first) 
> again until necessary or until no containers except the AM container are left,
> # (if not enough) it moves onto unreserve and preempt from the next 
> application. 
> # containers that have been asked to preempt are tracked across executions. 
> If a containers is among the one to be preempted for more than a certain 
> time, the container is moved in a the list of containers to be forcibly 
> killed. 
> Notes:
> (*) at the moment, in order to avoid double-counting of the requests, we only 
> look at the "ANY" part of pending resource requests, which means we might not 
> preempt on behalf of AMs that ask only for specific locations but not any. 
> (**) The ideal balance state is one in which each queue has at least its 
> guaranteed capacity, and the spare capacity is distributed among queues (that 
> wants some) as a weighted fair share. Where the weighting is based on the 
> guaranteed capacity of a queue, and the function runs to a fix point.  
> Tunables of the ProportionalCapacityPreemptionPolicy:
> # observe-only mode (i.e., log the actions it would take, but behave as 
> read-only)
> # how frequently to run the policy
> # how long to wait between preemption and kill of a container
> # which fraction of the containers I would like to obtain should I preempt 
> (has to do with the natural rate at which containers are returned)
> # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
> perfect balance by some small % we ignore it)
> # overall amou

[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Attachment: fair.patch

> FairScheduler: support for work-preserving preemption 
> --
>
> Key: YARN-568
> URL: https://issues.apache.org/jira/browse/YARN-568
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Carlo Curino
> Attachments: fair.patch
>
>
> In the attached patch, we modified  the FairScheduler to substitute its 
> preemption-by-killling with a work-preserving version of preemption (followed 
> by killing if the AMs do not respond quickly enough). This should allows to 
> run preemption checking more often, but kill less often (proper tuning to be 
> investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-567:
--

Attachment: common.patch

> RM changes to support preemption for FairScheduler and CapacityScheduler
> 
>
> Key: YARN-567
> URL: https://issues.apache.org/jira/browse/YARN-567
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: common.patch
>
>
> A common tradeoff in scheduling jobs is between keeping the cluster busy and 
> enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
> takes opposite stance on how to achieve this. 
> The FairScheduler, leverages task-killing to quickly reclaim resources from 
> currently running jobs and redistributing them among new jobs, thus keeping 
> the cluster busy but waste useful work. The CapacityScheduler is typically 
> tuned
> to limit the portion of the cluster used by each queue so that the likelihood 
> of violating capacity is low, thus never wasting work, but risking to keep 
> the cluster underutilized or have jobs waiting to obtain their rightful 
> capacity. 
> By introducing the notion of a work-preserving preemption we can remove this 
> tradeoff.  This requires a protocol for preemption (YARN-45), and 
> ApplicationMasters that can answer to preemption  efficiently (e.g., by 
> saving their intermediate state, this will be posted for MapReduce in a 
> separate JIRA soon), together with a scheduler that can issues preemption 
> requests (discussed in separate JIRAs YARN-568 and YARN-569).
> The changes we track with this JIRA are common to FairScheduler and 
> CapacityScheduler, and are mostly propagation of preemption decisions through 
> the ApplicationMastersService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Description: 
In the attached patch, we modified  the FairScheduler to substitute its 
preemption-by-killling with a work-preserving version of preemption (followed 
by killing if the AMs do not respond quickly enough). This should allows to run 
preemption checking more often, but kill less often (proper tuning to be 
investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.


  was:
In the attached patch, we modified  the FairScheduler to substitute its 
preemption-by-killling with a work-preserving version of preemption (followed 
by killing if the AMs do not respond quickly enough). This should allows to run 
preemption checking more often, but kill less often (proper tuning to be 
investigated).  Depends on YARN-567 and YARN-45.



> FairScheduler: support for work-preserving preemption 
> --
>
> Key: YARN-568
> URL: https://issues.apache.org/jira/browse/YARN-568
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Carlo Curino
>
> In the attached patch, we modified  the FairScheduler to substitute its 
> preemption-by-killling with a work-preserving version of preemption (followed 
> by killing if the AMs do not respond quickly enough). This should allows to 
> run preemption checking more often, but kill less often (proper tuning to be 
> investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Description: 
There is a tension between the fast-pace reactive role of the 
CapacityScheduler, which needs to respond quickly to 
applications resource requests, and node updates, and the more introspective, 
time-based considerations 
needed to observe and correct for capacity balance. To this purpose we opted 
instead of hacking the delicate
mechanisms of the CapacityScheduler directly to add support for preemption by 
means of a "Capacity Monitor",
which can be run optionally as a separate service (much like the 
NMLivelinessMonitor).

The capacity monitor (similarly to equivalent functionalities in the fairness 
scheduler) operates running on intervals 
(e.g., every 3 seconds), observe the state of the assignment of resources to 
queues from the capacity scheduler, 
performs off-line computation to determine if preemption is needed, and how 
best to "edit" the current schedule to 
improve capacity, and generates events that produce four possible actions:
# Container de-reservations
# Resource-based preemptions
# Container-based preemptions
# Container killing

The actions listed above are progressively more costly, and it is up to the 
policy to use them as desired to achieve the rebalancing goals. 
Note that due to the "lag" in the effect of these actions the policy should 
operate at the macroscopic level (e.g., preempt tens of containers
from a queue) and not trying to tightly and consistently micromanage container 
allocations. 


- Preemption policy  (ProportionalCapacityPreemptionPolicy): 
- 

Preemption policies are by design pluggable, in the following we present an 
initial policy (ProportionalCapacityPreemptionPolicy) we have been 
experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
follows:
# it gathers from the scheduler the state of the queues, in particular, their 
current capacity, guaranteed capacity and pending requests (*)
# if there are pending requests from queues that are under capacity it computes 
a new ideal balanced state (**)
# it computes the set of preemptions needed to repair the current schedule and 
achieve capacity balance (accounting for natural completion rates, and 
respecting bounds on the amount of preemption we allow for each round)
# it selects which applications to preempt from each over-capacity queue (the 
last one in the FIFO order)
# it remove reservations from the most recently assigned app until the amount 
of resource to reclaim is obtained, or until no more reservations exits
# (if not enough) it issues preemptions for containers from the same 
applications (reverse chronological order, last assigned container first) again 
until necessary or until no containers except the AM container are left,
# (if not enough) it moves onto unreserve and preempt from the next 
application. 
# containers that have been asked to preempt are tracked across executions. If 
a containers is among the one to be preempted for more than a certain time, the 
container is moved in a the list of containers to be forcibly killed. 

Notes:
(*) at the moment, in order to avoid double-counting of the requests, we only 
look at the "ANY" part of pending resource requests, which means we might not 
preempt on behalf of AMs that ask only for specific locations but not any. 
(**) The ideal balance state is one in which each queue has at least its 
guaranteed capacity, and the spare capacity is distributed among queues (that 
wants some) as a weighted fair share. Where the weighting is based on the 
guaranteed capacity of a queue, and the function runs to a fix point.  

Tunables of the ProportionalCapacityPreemptionPolicy:
#   observe-only mode (i.e., log the actions it would take, but behave as 
read-only)
# how frequently to run the policy
# how long to wait between preemption and kill of a container
# which fraction of the containers I would like to obtain should I preempt (has 
to do with the natural rate at which containers are returned)
# deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
perfect balance by some small % we ignore it)
# overall amount of preemption we can afford for each run of the policy (in 
terms of total cluster capacity)

In our current experiments this set of tunables seem to be a good start to 
shape the preemption action properly. More sophisticated preemption policies 
could take into account different type of applications running, job priorities, 
cost of preemption, integral of capacity imbalance. This is very much a 
control-theory kind of problem, and some of the lessons on designing and tuning 
controllers are likely to apply.

Generality:
The monitor-based scheduler edit, and the preemption mechanisms we introduced 
here are designed to be more general than enforcing capacity/fairness, i

[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-567:
--

Description: 
A common tradeoff in scheduling jobs is between keeping the cluster busy and 
enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
takes opposite stance on how to achieve this. 

The FairScheduler, leverages task-killing to quickly reclaim resources from 
currently running jobs and redistributing them among new jobs, thus keeping the 
cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood 
of violating capacity is low, thus never wasting work, but risking to keep the 
cluster underutilized or have jobs waiting to obtain their rightful capacity. 

By introducing the notion of a work-preserving preemption we can remove this 
tradeoff.  This requires a protocol for preemption (YARN-45), and 
ApplicationMasters that can answer to preemption  efficiently (e.g., by saving 
their intermediate state, this will be posted for MapReduce in a separate JIRA 
soon), together with a scheduler that can issues preemption requests (discussed 
in separate JIRAs YARN-568 and YARN-569).

The changes we track with this JIRA are common to FairScheduler and 
CapacityScheduler, and are mostly propagation of preemption decisions through 
the ApplicationMastersService.


  was:

A common tradeoff in scheduling jobs is between keeping the cluster busy and 
enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
takes opposite stance on how to achieve this. 

The FairScheduler, leverages task-killing to quickly reclaim resources from 
currently running jobs and redistributing them among new jobs, thus keeping the 
cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood 
of violating capacity is low, thus never wasting work, but risking to keep the 
cluster underutilized or have jobs waiting to obtain their rightful capacity. 

By introducing the notion of a work-preserving preemption we can remove this 
tradeoff.  This requires a protocol for preemption (YARN-45), and 
ApplicationMasters that can answer to preemption  efficiently (e.g., by saving 
their intermediate state, this will be posted for MapReduce in a separate JIRA 
soon), together with a scheduler that can issues preemption requests (discussed 
in separate JIRAs).

The changes we track with this JIRA are common to FairScheduler and 
CapacityScheduler, and are mostly propagation of preemption decisions through 
the ApplicationMastersService.



> RM changes to support preemption for FairScheduler and CapacityScheduler
> 
>
> Key: YARN-567
> URL: https://issues.apache.org/jira/browse/YARN-567
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> A common tradeoff in scheduling jobs is between keeping the cluster busy and 
> enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
> takes opposite stance on how to achieve this. 
> The FairScheduler, leverages task-killing to quickly reclaim resources from 
> currently running jobs and redistributing them among new jobs, thus keeping 
> the cluster busy but waste useful work. The CapacityScheduler is typically 
> tuned
> to limit the portion of the cluster used by each queue so that the likelihood 
> of violating capacity is low, thus never wasting work, but risking to keep 
> the cluster underutilized or have jobs waiting to obtain their rightful 
> capacity. 
> By introducing the notion of a work-preserving preemption we can remove this 
> tradeoff.  This requires a protocol for preemption (YARN-45), and 
> ApplicationMasters that can answer to preemption  efficiently (e.g., by 
> saving their intermediate state, this will be posted for MapReduce in a 
> separate JIRA soon), together with a scheduler that can issues preemption 
> requests (discussed in separate JIRAs YARN-568 and YARN-569).
> The changes we track with this JIRA are common to FairScheduler and 
> CapacityScheduler, and are mostly propagation of preemption decisions through 
> the ApplicationMastersService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)

Carlo Curino created YARN-569:
-

 Summary: CapacityScheduler: support for preemption (using a 
capacity monitor)
 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Carlo Curino


There is a tension between the fast-pace reactive role of the 
CapacityScheduler, which needs to respond quickly to 
applications resource requests, and node updates, and the more introspective, 
time-based considerations 
needed to observe and correct for capacity balance. To this purpose we opted 
instead of hacking the delicate
mechanisms of the CapacityScheduler directly to add support for preemption by 
means of a "Capacity Monitor",
which can be run optionally as a separate service (much like the 
NMLivelinessMonitor).

The capacity monitor (similarly to equivalent functionalities in the fairness 
scheduler) operates running on intervals 
(e.g., every 3 seconds), observe the state of the assignment of resources to 
queues from the capacity scheduler, 
performs off-line computation to determine if preemption is needed, and how 
best to "edit" the current schedule to 
improve capacity, and generates events that produce four possible actions:
# Container de-reservations
# Resource-based preemptions
# Container-based preemptions
# Container killing

The actions listed above are progressively more costly, and it is up to the 
policy to use them as desired to achieve the rebalancing goals. 
Note that due to the "lag" in the effect of these actions the policy should 
operate at the macroscopic level (e.g., preempt tens of containers
from a queue) and not trying to tightly and consistently micromanage container 
allocations. 


- Preemption policy  (ProportionalCapacityPreemptionPolicy): 
- 

Preemption policies are by design pluggable, in the following we present an 
initial policy (ProportionalCapacityPreemptionPolicy) we have been 
experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
follows:
# it gathers from the scheduler the state of the queues, in particular, their 
current capacity, guaranteed capacity and pending requests (*)
# if there are pending requests from queues that are under capacity it computes 
a new ideal balanced state (**)
# it computes the set of preemptions needed to repair the current schedule and 
achieve capacity balance (accounting for natural completion rates, and 
respecting bounds on the amount of preemption we allow for each round)
# it selects which applications to preempt from each over-capacity queue (the 
last one in the FIFO order)
# it remove reservations from the most recently assigned app until the amount 
of resource to reclaim is obtained, or until no more reservations exits
# (if not enough) it issues preemptions for containers from the same 
applications (reverse chronological order, last assigned container first) again 
until necessary or until no containers except the AM container are left,
# (if not enough) it moves onto unreserve and preempt from the next 
application. 
# containers that have been asked to preempt are tracked across executions. If 
a containers is among the one to be preempted for more than a certain time, the 
container is moved in a the list of containers to be forcibly killed. 

Notes:
(*) at the moment, in order to avoid double-counting of the requests, we only 
look at the "ANY" part of pending resource requests, which means we might not 
preempt on behalf of AMs that ask only for specific locations but not any. 
(**) The ideal balance state is one in which each queue has at least its 
guaranteed capacity, and the spare capacity is distributed among queues (that 
wants some) as a weighted fair share. Where the weighting is based on the 
guaranteed capacity of a queue, and the function runs to a fix point.  

Tunables of the ProportionalCapacityPreemptionPolicy:
#   observe-only mode (i.e., log the actions it would take, but behave as 
read-only)
# how frequently to run the policy
# how long to wait between preemption and kill of a container
# which fraction of the containers I would like to obtain should I preempt (has 
to do with the natural rate at which containers are returned)
# deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
perfect balance by some small % we ignore it)
# overall amount of preemption we can afford for each run of the policy (in 
terms of total cluster capacity)

In our current experiments this set of tunables seem to be a good start to 
shape the preemption action properly. More sophisticated preemption policies 
could take into account different type of applications running, job priorities, 
cost of preemption, integral of capacity imbalance. This is very much a 
control-theory kind of problem, and some of the lessons on designing and tuning 
c

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628930#comment-13628930
 ] 

Carlo Curino commented on YARN-45:
--

Sorry I read only your last comment and answered to that... 

Regarding your previous "larger" comment:
- what you propose is somewhat of a combination of 1 and 2 above, where we give 
the AM a hint about what would happen at the container level if the pressure 
remains. I don't have strong feelings about it, I agree it is easy to do, and 
maybe is a good compromised.
- however, I want to be able to maintain the tighter semantics of 1 (in case 
the ResourceRequest is not specified in the message), which forces the AM to 
preempt exactly the set of containers I am specifying. (now with very 
"targeted" ResourceRequest you can in practice 
do something similar). This covers use cases like the one I mentioned above.

We are posting more code in YARN-567 YARN-568 and YARN-569, check it out, it 
might provide context for this conversation.  


> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)

Carlo Curino created YARN-568:
-

 Summary: FairScheduler: support for work-preserving preemption 
 Key: YARN-568
 URL: https://issues.apache.org/jira/browse/YARN-568
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Carlo Curino


In the attached patch, we modified  the FairScheduler to substitute its 
preemption-by-killling with a work-preserving version of preemption (followed 
by killing if the AMs do not respond quickly enough). This should allows to run 
preemption checking more often, but kill less often (proper tuning to be 
investigated).  Depends on YARN-567 and YARN-45.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628922#comment-13628922
 ] 

Carlo Curino commented on YARN-45:
--

Our main focus for now is to rebalance capacity, in this sense yes location is 
not important. 

However, one can envision the use of preemption also for other things, e.g., to 
build a monitor that 
tries to improve data-locality by issuing (a moderate amount of) "relocations" 
of a container (probably
riding the same checkpointing mechanics we are bulding for MR). 

This is another case where container-based preemption can turn out to be 
useful. (This is at the moment 
just a speculation).


> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)

Carlo Curino created YARN-567:
-

 Summary: RM changes to support preemption for FairScheduler and 
CapacityScheduler
 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino



A common tradeoff in scheduling jobs is between keeping the cluster busy and 
enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
takes opposite stance on how to achieve this. 

The FairScheduler, leverages task-killing to quickly reclaim resources from 
currently running jobs and redistributing them among new jobs, thus keeping the 
cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood 
of violating capacity is low, thus never wasting work, but risking to keep the 
cluster underutilized or have jobs waiting to obtain their rightful capacity. 

By introducing the notion of a work-preserving preemption we can remove this 
tradeoff.  This requires a protocol for preemption (YARN-45), and 
ApplicationMasters that can answer to preemption  efficiently (e.g., by saving 
their intermediate state, this will be posted for MapReduce in a separate JIRA 
soon), together with a scheduler that can issues preemption requests (discussed 
in separate JIRAs).

The changes we track with this JIRA are common to FairScheduler and 
CapacityScheduler, and are mostly propagation of preemption decisions through 
the ApplicationMastersService.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628902#comment-13628902
 ] 

Hudson commented on YARN-487:
-

Integrated in Hadoop-Hdfs-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/])
YARN-487. Modify path manipulation in LocalDirsHandlerService to let 
TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 
1466746)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466746
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java


> TestDiskFailures fails on Windows due to path mishandling
> -
>
> Key: YARN-487
> URL: https://issues.apache.org/jira/browse/YARN-487
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0
>
> Attachments: YARN-487.1.patch
>
>
> {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
> extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
> on Windows.  The test assertions also fail to account for the fact that 
> {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628898#comment-13628898
 ] 

Hudson commented on YARN-539:
-

Integrated in Hadoop-Hdfs-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/])
YARN-539. Addressed memory leak of LocalResource objects NM when a resource 
localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466756
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


> LocalizedResources are leaked in memory in case resource localization fails
> ---
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.0.5-beta
>
> Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, 
> yarn-539-20130410.patch
>
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space 
> crunch. (If sufficient space in cache is available then it will remain in 
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to 
> LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628897#comment-13628897
 ] 

Hudson commented on YARN-495:
-

Integrated in Hadoop-Hdfs-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/])
YARN-495. Changed NM reboot behaviour to be a simple resync - kill all 
containers  and re-register with RM. Contributed by Jian He. (Revision 1466752)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466752
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java


> Change NM behavior of reboot to resync
> --
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, 
> YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628893#comment-13628893
 ] 

Alejandro Abdelnur commented on YARN-45:


Forgot to add, unless I'm missing something location of the preemption is not 
important, just capacity, right?

> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628889#comment-13628889
 ] 

Alejandro Abdelnur commented on YARN-45:


Carlo, what about a small twist?

A preempt message (instead of request, as there is no preempt response) would 
contain:

* Resources (# CPUs & # Memory) : total amount of resources that may be 
preempted if no action is taken by the AM.
* Set : list of containers that would be killed by the RM to claim 
the resources if no action is taken by the AM.

Computing the resources is straight forward, just aggregating the resources of 
the Set.

An AM can take action using either or information.

If an AM releases the requested amount of resources, even if they don't match 
the received container IDs, then the AM will not be over threshold anymore, 
thus getting rid of the preemption pressure fully or partially. If the AM 
fullfils the preemption only partially, then the RM will still kill some 
containers from the set.

As the set is not ordered, still it is not known to the AM what containers will 
exactly be killed. So the set is just the list of containers in danger of being 
preempted.

I may be backtracking a bit on my previous comments, 'trading these containers 
for equivalent ones' seems acceptable and gives the scheduler some freedom on 
how to best take care of things if an AM is over limit. If an AM releases the 
requested amount of resources, regardless of what containers releases, the AM 
won't be preempted for this preemption message. We just need to clearly spell 
out the behavior.

With this approach I think we don't need #1 and #2?

Thoughts?



> Scheduler feedback to AM to release containers
> --
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Chris Douglas
>Assignee: Carlo Curino
> Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628868#comment-13628868
 ] 

Hadoop QA commented on YARN-427:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578200/YARN-427-trunk-b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/716//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/716//console

This message is automatically generated.

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
> YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
> YARN-427-trunk-b.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628860#comment-13628860
 ] 

Aleksey Gorshkov commented on YARN-427:
---

Patches updated 
patch YARN-427-trunk-b.patch for trunk
patch YARN-427-branch-2-b.patch for branch-2 
patch YARN-427-branch-0.23-b.patch for branch-0.23

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
> YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
> YARN-427-trunk-b.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-trunk-b.patch

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
> YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
> YARN-427-trunk-b.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-branch-2-b.patch
YARN-427-branch-0.23-b.patch

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
> YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
> YARN-427-trunk-b.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628850#comment-13628850
 ] 

Hadoop QA commented on YARN-427:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578196/YARN-427-trunk-b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/715//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/715//console

This message is automatically generated.

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
> YARN-427-trunk-a.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: (was: YARN-427-branch-2-b.patch)

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
> YARN-427-trunk-a.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: (was: YARN-427-trunk-b.patch)

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
> YARN-427-trunk-a.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-trunk-b.patch

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, 
> YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, 
> YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-branch-2-b.patch

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, 
> YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, 
> YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628830#comment-13628830
 ] 

Hudson commented on YARN-487:
-

Integrated in Hadoop-Yarn-trunk #180 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/180/])
YARN-487. Modify path manipulation in LocalDirsHandlerService to let 
TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 
1466746)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466746
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java


> TestDiskFailures fails on Windows due to path mishandling
> -
>
> Key: YARN-487
> URL: https://issues.apache.org/jira/browse/YARN-487
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0
>
> Attachments: YARN-487.1.patch
>
>
> {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
> extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
> on Windows.  The test assertions also fail to account for the fact that 
> {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628826#comment-13628826
 ] 

Hudson commented on YARN-539:
-

Integrated in Hadoop-Yarn-trunk #180 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/180/])
YARN-539. Addressed memory leak of LocalResource objects NM when a resource 
localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466756
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


> LocalizedResources are leaked in memory in case resource localization fails
> ---
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.0.5-beta
>
> Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, 
> yarn-539-20130410.patch
>
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space 
> crunch. (If sufficient space in cache is available then it will remain in 
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to 
> LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync

2013-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628825#comment-13628825
 ] 

Hudson commented on YARN-495:
-

Integrated in Hadoop-Yarn-trunk #180 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/180/])
YARN-495. Changed NM reboot behaviour to be a simple resync - kill all 
containers  and re-register with RM. Contributed by Jian He. (Revision 1466752)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466752
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java


> Change NM behavior of reboot to resync
> --
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, 
> YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 105 matches

Mail list logo