[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163120#comment-14163120
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

The test failure is not related - the failure of TestFairScheduler is reported 
on YARN-2252 and the failure of TestPipeApplication is reported on YARN-6115. 

[~jianhe], [~jlowe], could you review latest patch?

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
> YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure of TestFairScheduler.testContinuousScheduling

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163118#comment-14163118
 ] 

Tsuyoshi OZAWA commented on YARN-2252:
--

Hi, this test failure is found on trunk - you can find it 
[here|https://issues.apache.org/jira/browse/YARN-2312?focusedCommentId=14161902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14161902].
 Log is as follows:

{code}
Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
  Time elapsed: 0.582 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
{code}

Can I reopen this problem?

> Intermittent failure of TestFairScheduler.testContinuousScheduling
> --
>
> Key: YARN-2252
> URL: https://issues.apache.org/jira/browse/YARN-2252
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: trunk-win
>Reporter: Ratandeep Ratti
>  Labels: hadoop2, scheduler, yarn
> Fix For: 2.6.0
>
> Attachments: YARN-2252-1.patch, yarn-2252-2.patch
>
>
> This test-case is failing sporadically on my machine. I think I have a 
> plausible explanation  for this.
> It seems that when the Scheduler is being asked for resources, the resource 
> requests that are being constructed have no preference for the hosts (nodes).
> The two mock hosts constructed, both have a memory of 8192 mb.
> The containers(resources) being requested each require a memory of 1024mb, 
> hence a single node can execute both the resource requests for the 
> application.
> In the end of the test-case it is being asserted that the containers 
> (resource requests) be executed on different nodes, but since we haven't 
> specified any preferences for nodes when requesting the resources, the 
> scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for LRS

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163116#comment-14163116
 ] 

Zhijie Shen commented on YARN-2582:
---

Thanks for the patch, Xuan! Here're some comments and thoughts.

1. This may not be sufficient. For example, "tmp_\[timestamp\].log" will become 
a false positive case here. We need to check whether the file name ends with 
TMP_FILE_SUFFIX or not.
{code}
  && !fileName.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}
{code}
  if (!thisNodeFile.getPath().getName()
.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}

2. I'm thinking what the better way should be to handle the case that it fails 
at fetching one log file, but not the others. Thoughts?
{code}
AggregatedLogFormat.LogReader reader = null;
try {
  reader =
  new AggregatedLogFormat.LogReader(getConf(),
thisNodeFile.getPath());
  if (dumpAContainerLogs(containerId, reader, System.out) > -1) {
foundContainerLogs = true;
  }
} finally {
  if (reader != null) {
reader.close();
  }
}
{code}

3. In fact, in the case of either single container or all containers, when the 
container log is not found, it is possible that 1) the container log hasn't 
been aggregated yet or 2) the log aggregation is not enabled. The the former 
case it is also possible to have the third situation: the user are providing an 
invalid nodeId. It's good to uniform the warning message.
{code}
if (!foundContainerLogs) {
  System.out.println("Logs for container " + containerId
  + " are not present in this log-file.");
  return -1;
}
{code}
{code}
if (! foundAnyLogs) {
  System.out.println("Logs not available at " + remoteAppLogDir.toString());
  System.out
  .println("Log aggregation has not completed or is not enabled.");
  return -1;
}
{code}

4. It seems that we don't have unit tests to cover the successful path of 
dumping the container log(s) currently. I'm not sure if it is going to be a big 
addition. If it is, let's handle it separately.

There're two general issues:

1. Currently we don't have the web page on RM web app to show the aggregated 
logs. One the other hand, LRS is supposed to be always in the running state, 
such that the timeline server will not present it. Therefore, we still haven't 
a web entrance to view the aggregated logs.

2. For LRS, the latest logs are not aggregated in the HDFS, but in the local 
dir of NM. When a user says he wants to check the logs of this LRS, it doesn't 
make sense that he can just view yesterday's log, because today's is still not 
uploaded to HDFS. It's good to have a combined view of both latest log on local 
dir of NM and aggregated logs on HDFS. Thoughts?

> Log related CLI and Web UI changes for LRS
> --
>
> Key: YARN-2582
> URL: https://issues.apache.org/jira/browse/YARN-2582
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2582.1.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163088#comment-14163088
 ] 

Hadoop QA commented on YARN-2180:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673521/YARN-2180-trunk-v7.patch
  against trunk revision 1efd9c9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5321//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5321//console

This message is automatically generated.

> In-memory backing store for cache manager
> -
>
> Key: YARN-2180
> URL: https://issues.apache.org/jira/browse/YARN-2180
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
> YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
> YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch
>
>
> Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2655:

Description: 
AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

Screenshot attached

  was:AllocatedGB/AvailableGB in nodemanager JMX showing only integer values


> AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
> --
>
> Key: YARN-2655
> URL: https://issues.apache.org/jira/browse/YARN-2655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
> Screenshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2655:

Attachment: screenshot-2.png

> AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
> --
>
> Key: YARN-2655
> URL: https://issues.apache.org/jira/browse/YARN-2655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> AllocatedGB/AvailableGB in nodemanager JMX showing only integer values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2655:

Attachment: screenshot-1.png

> AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
> --
>
> Key: YARN-2655
> URL: https://issues.apache.org/jira/browse/YARN-2655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> AllocatedGB/AvailableGB in nodemanager JMX showing only integer values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2655:
---

 Summary: AllocatedGB/AvailableGB in nodemanager JMX showing only 
integer values
 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor


AllocatedGB/AvailableGB in nodemanager JMX showing only integer values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163057#comment-14163057
 ] 

Hadoop QA commented on YARN-2579:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673338/YARN-2579.patch
  against trunk revision 1efd9c9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5320//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5320//console

This message is automatically generated.

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2180) In-memory backing store for cache manager

2014-10-07 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2180:
---
Attachment: YARN-2180-trunk-v7.patch

[~kasha]

Attached v7. Changes:
1. Moved AppChecker parameter to SCM instead of store and changed variable 
names.
2. Moved createAppCheckerService method to a static method in 
SharedCacheManager. Originally I moved it to the shared cache util class, but 
that is in yarn-server-common which required moving AppChecker as well. I only 
see AppChecker being used in the context of the SCM so I kept it where it was. 
Let me know if you would still like me to move it to common and make the create 
method part of the util class.
3. Renamed SharedCacheStructureUtil to SharedCacheUtil.
4. Filed YARN-2654 subtask to revisit config names.

> In-memory backing store for cache manager
> -
>
> Key: YARN-2180
> URL: https://issues.apache.org/jira/browse/YARN-2180
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
> YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
> YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch
>
>
> Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2014-10-07 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-2654:
--

 Summary: Revisit all shared cache config parameters to ensure 
quality names
 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Priority: Blocker


Revisit all the shared cache config parameters in YarnConfiguration and 
yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162917#comment-14162917
 ] 

Junping Du commented on YARN-2331:
--

Thanks [~jlowe] for the patch. One thing I want to confirm here is: after this 
patch, if we setting "yarn.nodemanager.recovery.enabled" to true but setting 
"yarn.nodemanager.recovery.supervised" to false, we can still keep container 
running if we kill NM daemon by "kill -9" but go through "yarn-daemon.sh stop 
nodemanager" will kill running containers. Isn't it?

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2331.patch, YARN-2331v2.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162908#comment-14162908
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673478/YARN-913-021.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5319//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5319//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5319//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
> YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162875#comment-14162875
 ] 

Karthik Kambatla commented on YARN-1879:


Thanks for the updates, Tsuyoshi. Sorry to vacillate on this JIRA. 

Chatted with Jian offline about marking the methods in question Idempotent vs 
AtMostOnce. I believe we agreed on "Methods that identify and ignore duplicate 
requests as duplicate should be AtMostOnce, and those that repeat the method 
without any adverse side-effects should be Idempotent. Retries on RM 
restart/failover are the same for Idempotent and AtMostOnce methods."

Following that, registerApplicationMaster should be Idempotent, allocate and 
finishApplicationMaster should be AtMostOnce. Given all the methods handle 
duplicate requests, the retry-cache is not necessary but could be an 
optimization we can pursue/investigate on another JIRA.

Review comments on the patch:
# Nit: Not added in this patch, can we rename 
TestApplicationMasterServiceProtocolOnHA#initiate to initialize()? 
# IIUC, the tests (ProtocolHATestBase) induce a failover while processing one 
of these requests. We should also probably add a test that makes duplicate 
requests to the same/different RM and verify the behavior is as expected. 
Correct me if existing tests already do this.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162865#comment-14162865
 ] 

Hadoop QA commented on YARN-2629:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673479/YARN-2629.4.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5318//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5318//console

This message is automatically generated.

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, 
> YARN-2629.4.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162862#comment-14162862
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673460/YARN-913-020.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5317//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5317//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5317//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
> YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2653) Support an AM liveness mechanism that works for both sync/async AM-RM clients

2014-10-07 Thread sidharta seethana (JIRA)
sidharta seethana created YARN-2653:
---

 Summary: Support an AM liveness mechanism that works for both 
sync/async AM-RM clients
 Key: YARN-2653
 URL: https://issues.apache.org/jira/browse/YARN-2653
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client, resourcemanager
Reporter: sidharta seethana
Priority: Minor


Currently, the “heartbeat” mechanism is only supported in the async client ( 
AMRMClientAsyncImpl ). The reason for this seems to be because liveness 
monitoring is currently implemented based on periodic (possibly empty) 
allocation requests - which may return container allocation responses for 
earlier allocation requests. This mechanism only works for the async client 
where allocation responses can be queued and is not application to the 
synchronous/blocking client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-07 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162835#comment-14162835
 ] 

Varun Vasudev commented on YARN-2566:
-

Unfortunately the defaults are still to use the full disk. We probably just 
change the default to use a maximum of 90% of disk space.

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_01 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
> OPERATION=Container Finished - Failed   TARGET=Cont

[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162823#comment-14162823
 ] 

Jian He commented on YARN-1879:
---

Thanks Tsuyoshi !
One more thing is that when failover happens on unregister, if RM1 already 
saves the app final state into state-store, and AM tries to unregister with 
RM2, AM will get ApplicationDoesNotExistInCacheException. We should probably 
move the following check in finishApplicationMaster call upfront so that 
ApplicationDoesNotExistInCacheException is not thrown for already completed 
apps.
 {code}
  if (rmApp.isAppFinalStateStored()) {
return FinishApplicationMasterResponse.newInstance(true);
  }
{code} 
[~kasha], do you have more comments ?

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162820#comment-14162820
 ] 

Zhijie Shen commented on YARN-2517:
---

bq. Clearly, async calls need call-back handlers just for errors. As of today, 
there are no APIs that really need to send back results (not error) 
asynchronously. 

I realize that another requirement of TimelineClient is to wrap the read APIs 
to facilitate the java app developers. These APIs are going to return results.

> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162816#comment-14162816
 ] 

Zhijie Shen commented on YARN-2423:
---

Update the title and description, as it's good to do the work for domain APIs 
too.

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-10-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2423:
--
Description: TimelineClient provides the Java method to put timeline 
entities. It's also good to wrap over all GET APIs (both entity and domain), 
and deserialize the json response into Java POJO objects.  (was: TimelineClient 
provides the Java method to put timeline entities. It's also good to wrap over 
three GET APIs, and deserialize the json response into Java POJO objects.)
Summary: TimelineClient should wrap all GET APIs to facilitate Java 
users  (was: TimelineClient should wrap the three GET APIs to facilitate Java 
users)

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162812#comment-14162812
 ] 

Li Lu commented on YARN-2629:
-

OK, new patch LGTM. Maybe some committers would like to check it again and 
commit it? 

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, 
> YARN-2629.4.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-021.patch

the -021 patch is the -020 patch with all whitespace stripped, so that patch 
-p0 will also apply it cleanly

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
> YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2629:
--
Attachment: YARN-2629.4.patch

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, 
> YARN-2629.4.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162804#comment-14162804
 ] 

Zhijie Shen commented on YARN-2629:
---

bq. 1. I noticed that you've changed a few method signatures, and removed the 
"throws IOException, YarnException" clause from the signatures. Meanwhile, in 
the method I noticed that you're catching all types of exceptions in one catch 
statement. I agree that this change saves some repeated lines of try-catch 
statements in various call sites, but is it OK catch all exceptions internally?

Previously, the exception was handled at the caller of publish(), but I 
moved into publish() to simplify the change, such that there's no need to 
throw the exception any more. Usually, it's a good behavior to handle each 
potential exception explicitly. Here since I intentionally don't want to do 
such fine-grained handling, I choose to log whatever the exception is.

bq. 2. Maybe we want to add one more test with no arguments on 
domain/view_acl/modify_acl? Timeline domain is a newly added feature, and maybe 
we'd like to make sure it would not break any existing distributed shell usages 
(esp. in some scripts)?

Added one

bq. This variable name looks unclear. Maybe we want to rename it so that people 
can easily connect it with timeline domains?

It's explained in the usage of the option and have be commented:
{code}
  // Flag to indicate whether to create the domain of the given ID
  private boolean toCreate = false;
{code}

but let me elaborate this var in code.

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Description: 
improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).
This will be a much more serious issue:
After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed 
before NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. 
The RMNode will only expire in RM after  
"yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.

  was:
improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).


> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is 
> killed before NM sent heartbeat to RM. The RMNode will never be 
> decommissioned in RM. The RMNode will only expire in RM after  
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162764#comment-14162764
 ] 

Hadoop QA commented on YARN-2641:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673446/YARN-2641.002.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5316//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5316//console

This message is automatically generated.

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162740#comment-14162740
 ] 

zhihai xu commented on YARN-2641:
-

Hi [~djp], thanks to review the patch. I removed the following RMNode 
decommission in nodeHeartbeat(ResourceTrackerService.java).

{code}
   this.rmContext.getDispatcher().getEventHandler().handle(
  new RMNodeEvent(nodeId, RMNodeEventType.DECOMMISSION));
{code}

I added RMNode decommission in refreshNodes(NodesListManager.java).

Did you still see  the decommission happen after the heartbeat back to NM in 
the patch?

I didn't have unit test in my first patch(YARN-2641.000.patch).

In my second patch(YARN-2641.001.patch), I change the unit test in 
TestResourceTrackerService to verify the RMNodeEventType.DECOMMISSION is sent 
in {code}rm.getNodesListManager().refreshNodes(conf);{code} instead of
{code}nodeHeartbeat = nm1.nodeHeartbeat(true); {code}


> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-020.patch

patch -020; deleted unused operation/assignment in 
{{RegistryUtils.statChildren()}}

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
> YARN-913-019.patch, YARN-913-020.patch, yarnregistry.pdf, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.002.patch)

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162683#comment-14162683
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673424/YARN-913-019.patch
  against trunk revision 30d56fd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5315//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5315//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5315//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5315//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
> YARN-913-019.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162665#comment-14162665
 ] 

Li Lu commented on YARN-2629:
-

Hi [~zjshen], I reviewed this patch (YARN-2629.3.patch), and it generally LGTM. 
However, I do have a few quick comments:

1. I noticed that you've changed a few method signatures, and removed the 
"throws IOException, YarnException" clause from the signatures. Meanwhile, in 
the method I noticed that you're catching all types of exceptions in one catch 
statement. I agree that this change saves some repeated lines of try-catch 
statements in various call sites, but is it OK catch all exceptions internally? 
 (I think so, but would like to verify that. )

2. Maybe we want to add one more test with no arguments on 
domain/view_acl/modify_acl? Timeline domain is a newly added feature, and maybe 
we'd like to make sure it would not break any existing distributed shell usages 
(esp. in some scripts)? 

3. In Client.java:
{code}
private boolean toCreate = false;
{code}
This variable name looks unclear. Maybe we want to rename it so that people can 
easily connect it with timeline domains?

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.002.patch

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch, 
> YARN-2641.002.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162640#comment-14162640
 ] 

Hadoop QA commented on YARN-2320:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673422/YARN-2320.3.patch
  against trunk revision 30d56fd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5314//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5314//console

This message is automatically generated.

> Removing old application history store after we store the history data to 
> timeline store
> 
>
> Key: YARN-2320
> URL: https://issues.apache.org/jira/browse/YARN-2320
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch
>
>
> After YARN-2033, we should deprecate application history store set. There's 
> no need to maintain two sets of store interfaces. In addition, we should 
> conclude the outstanding jira's under YARN-321 about the application history 
> store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state

2014-10-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162530#comment-14162530
 ] 

Jian He commented on YARN-2483:
---

YARN-2649 may solve this

> TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to 
> incorrect AppAttempt state
> 
>
> Key: YARN-2483
> URL: https://issues.apache.org/jira/browse/YARN-2483
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Ted Yu
>
> From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console :
> {code}
> testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
>   Time elapsed: 49.686 sec  <<< FAILURE!
> java.lang.AssertionError: AppAttempt state is not correct (timedout) 
> expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402)
> {code}
> TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with 
> similar cause.
> These tests failed in build #664 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-019.patch

This a patch which addresses sanjay's review comments, with the key changes 
being

* {{ServiceRecord}} supports arbitrary string attributes; the yarn ID and 
persistence are simply two of these.
* moved zk-related classes under {{/impl/zk}} package
* TLA+ spec in sync with new design

I also 
* moved the packaging from {{org.apache.hadoop.yarn.registry}} to  
{{org.apache.hadoop.registry}}. That would reduce the impact of any promotion 
of the registry into hadoop-common if ever desired.
* moved the registry/site documentation under yarn-site and hooked up with 
existing site docs.
* added the TLA toolbox generated artifacts (pdf and a toolbox dir) to 
{{/.gitignore}}


> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
> YARN-913-019.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162510#comment-14162510
 ] 

Hudson commented on YARN-1857:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6206 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6206/])
YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. 
Contributed by Chen He and Craig Welch (jianhe: rev 
30d56fdbb40d06c4e267d6c314c8c767a7adc6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-10-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2320:
--
Attachment: YARN-2320.3.patch

Update against the latest trunk

> Removing old application history store after we store the history data to 
> timeline store
> 
>
> Key: YARN-2320
> URL: https://issues.apache.org/jira/browse/YARN-2320
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch
>
>
> After YARN-2033, we should deprecate application history store set. There's 
> no need to maintain two sets of store interfaces. In addition, we should 
> conclude the outstanding jira's under YARN-321 about the application history 
> store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162497#comment-14162497
 ] 

Chen He commented on YARN-1857:
---

Sorry, my bad, looks like YARN-2400 is checked in. Anyway, it is not related to 
this patch.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162496#comment-14162496
 ] 

Chen He commented on YARN-1857:
---

Unit test failure is because of YARN-2400

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: yarnregistry.pdf

TLA+ specification in sync with (forthcoming -019 patch)

# uses consistent naming
# service record specified as allowing arbitrary {{String |-> String}} 
attributes; {{yarn:id}} and {{yarn:persistence}} are merely two of these

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162487#comment-14162487
 ] 

Jian He commented on YARN-1857:
---

Craig, thanks for updating. 
looks good, +1

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2652) add hadoop-yarn-registry package under hadoop-yarn

2014-10-07 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2652:


 Summary: add hadoop-yarn-registry package under hadoop-yarn
 Key: YARN-2652
 URL: https://issues.apache.org/jira/browse/YARN-2652
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Steve Loughran
Assignee: Steve Loughran


Commit the core {{hadoop-yarn-registry}} module, including
* changes to POMs to include in the build
* additions to the yarn site

This patch excludes
* RM integration (YARN-2571)
* Distributed Shell integration and tests (YARN-2646)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162461#comment-14162461
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673410/YARN-1857.7.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5311//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5311//console

This message is automatically generated.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162419#comment-14162419
 ] 

Hadoop QA commented on YARN-2414:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673402/YARN-2414.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5310//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5310//console

This message is automatically generated.

> RM web UI: app page will crash if app is failed before any attempt has been 
> created
> ---
>
> Key: YARN-2414
> URL: https://issues.apache.org/jira/browse/YARN-2414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Zhijie Shen
>Assignee: Wangda Tan
> Attachments: YARN-2414.patch
>
>
> {code}
> 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/app/application_1407887030038_0001
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jett

[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162416#comment-14162416
 ] 

Hadoop QA commented on YARN-2583:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673409/YARN-2583.4.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5313//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5313//console

This message is automatically generated.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162414#comment-14162414
 ] 

Hadoop QA commented on YARN-2331:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5312//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5312//console

This message is automatically generated.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2331.patch, YARN-2331v2.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162366#comment-14162366
 ] 

Jason Lowe commented on YARN-2414:
--

Thanks, Wangda!  Looks good overall.  Nit: appMerics should be appMetrics.

Have you done any manual testing with this patch?  Seems like it would be 
straightforward to mock up an injector with a mock context containing an app 
with no attempts.  With that the unit test can verify render doesn't throw.   
Thinking something along the lines of WebAppTests.testPage/testBlock.

> RM web UI: app page will crash if app is failed before any attempt has been 
> created
> ---
>
> Key: YARN-2414
> URL: https://issues.apache.org/jira/browse/YARN-2414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Zhijie Shen
>Assignee: Wangda Tan
> Attachments: YARN-2414.patch
>
>
> {code}
> 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/app/application_1407887030038_0001
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.Ht

[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162357#comment-14162357
 ] 

Chen He commented on YARN-1857:
---

Hi [~cwelch], 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
 has the detail information about these parameters. 

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1857:
--
Attachment: YARN-1857.7.patch


I see it now - uploading newer patch with simplified final headroom calculation

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.4.patch

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162324#comment-14162324
 ] 

Xuan Gong commented on YARN-2583:
-

Upload a new patch to address all the comments

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2331:
-
Attachment: YARN-2331v2.patch

Updated patch to fix the unit tests.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2331.patch, YARN-2331v2.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162316#comment-14162316
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673374/YARN-796.node-label.consolidate.14.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5307//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5307//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5307//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5307//console

This message is automatically generated.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2414:
-
Attachment: YARN-2414.patch

Attached a simple fix for this issue.
Please kindly view. 

Thanks,

> RM web UI: app page will crash if app is failed before any attempt has been 
> created
> ---
>
> Key: YARN-2414
> URL: https://issues.apache.org/jira/browse/YARN-2414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Zhijie Shen
>Assignee: Wangda Tan
> Attachments: YARN-2414.patch
>
>
> {code}
> 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/app/application_1407887030038_0001
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>   at 
> org.mortbay.thread.QueuedThreadPool$Po

[jira] [Commented] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162266#comment-14162266
 ] 

Hadoop QA commented on YARN-2633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673394/YARN-2633.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5309//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5309//console

This message is automatically generated.

> TestContainerLauncherImpl sometimes fails
> -
>
> Key: YARN-2633
> URL: https://issues.apache.org/jira/browse/YARN-2633
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2633.patch, YARN-2633.patch
>
>
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
>   at java.lang.Class.getMethod(Class.java:1665)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2014-10-07 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-314:
--
Attachment: yarn-314-prelim.patch

Attaching a preliminary patch for any early feedback. 

> Schedulers should allow resource requests of different sizes at the same 
> priority and location
> --
>
> Key: YARN-314
> URL: https://issues.apache.org/jira/browse/YARN-314
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Karthik Kambatla
> Fix For: 2.6.0
>
> Attachments: yarn-314-prelim.patch
>
>
> Currently, resource requests for the same container and locality are expected 
> to all be the same size.
> While it it doesn't look like it's needed for apps currently, and can be 
> circumvented by specifying different priorities if absolutely necessary, it 
> seems to me that the ability to request containers with different resource 
> requirements at the same priority level should be there for the future and 
> for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162216#comment-14162216
 ] 

Vinod Kumar Vavilapalli commented on YARN-2566:
---

I am surprised YARN-1781 didn't take care of this /cc [~vvasudev].

IAC, instead of creating yet another algorithm, this should plug into the 
good-local-dirs added via YARN-1781. It can simply take in the one (first or 
randomized) of the good-local-dirs instead of the first in the overall list. 
That should address this issue.

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_01 

[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-07 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2566:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-91

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_01 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
>   APPID=applica

[jira] [Updated] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-07 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2633:

Attachment: YARN-2633.patch

> TestContainerLauncherImpl sometimes fails
> -
>
> Key: YARN-2633
> URL: https://issues.apache.org/jira/browse/YARN-2633
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2633.patch, YARN-2633.patch
>
>
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
>   at java.lang.Class.getMethod(Class.java:1665)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162170#comment-14162170
 ] 

Craig Welch commented on YARN-1857:
---

This is an interesting question - that logic predates this change, and I 
wondered if there were cases when userLimit could somehow be > queueMaxCap, and 
as I look at the code, surprisingly, I believe so.  Userlimit is calculated 
based on absolute queue values whereas, at least since [YARN-2008], queueMaxCap 
takes into account actual useage in other queues.  So, it is entirely possible 
for userLimit to be > queueMaxCap due to how they are calculated, at least post 
[YARN-2008].  I'm not sure if pre-2008 that was possible as well, it may have 
been, there is a bit to how that was calculated even before that change - in 
any event, it is the case now.  So, as it happens, I don't believe we can do 
the simplification.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
> YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162157#comment-14162157
 ] 

Xuan Gong commented on YARN-2583:
-

Here is the proposal:
* Add private configuration for number of logs we can save in NM side. We will 
delete old logs if the num of logs is larger than this configured value. This 
is a temporary solution. The configuration will be deleted once we find a more 
scalable method(will be tracked by YARN-2548) to only write a single log file 
per LRS. 
* jhs contacts RM to check whether app is still running or not. If this app is 
still running, we need to keep the app dir, but remove the old logs.
* Remove per-app LogRollingInterval completely and then have NM wake up every 
so often and upload log files. In this ticket, we can spin off 
LogRollingInterval from AppLogAggregatorImpl. YARN-2651 will be used to track 
the changes for other places.
* Enforce the minimal log rolling interval. (3600 seconds will be used as 
minimal value)

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2651) Spin off the LogRollingInterval from LogAggregationContext

2014-10-07 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2651:
---

 Summary: Spin off the LogRollingInterval from LogAggregationContext
 Key: YARN-2651
 URL: https://issues.apache.org/jira/browse/YARN-2651
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162143#comment-14162143
 ] 

Vinod Kumar Vavilapalli commented on YARN-2583:
---

[~xgong], can you please post a short summary of the final proposal here?

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162117#comment-14162117
 ] 

Hadoop QA commented on YARN-2331:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673377/YARN-2331.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery
  
org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5308//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5308//console

This message is automatically generated.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2331.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk

2014-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2650:
-
Attachment: TestRMRestart.tar.gz

Test output

> TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
> --
>
> Key: YARN-2650
> URL: https://issues.apache.org/jira/browse/YARN-2650
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Attachments: TestRMRestart.tar.gz
>
>
> I got the following failure running on Linux:
> {code}
>   TestRMRestart.testRMRestartGetApplicationList:952
> rMAppManager.logApplicationSummary(
> isA(org.apache.hadoop.yarn.api.records.ApplicationId)
> );
> Wanted 3 times:
> -> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952)
> But was 2 times:
> -> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162091#comment-14162091
 ] 

Jason Lowe commented on YARN-1915:
--

Test failure is unrelated, see YARN-2483.

> ClientToAMTokenMasterKey should be provided to AM at launch time
> 
>
> Key: YARN-1915
> URL: https://issues.apache.org/jira/browse/YARN-1915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Hitesh Shah
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch
>
>
> Currently, the AM receives the key as part of registration. This introduces a 
> race where a client can connect to the AM when the AM has not received the 
> key. 
> Current Flow:
> 1) AM needs to start the client listening service in order to get host:port 
> and send it to the RM as part of registration
> 2) RM gets the port info in register() and transitions the app to RUNNING. 
> Responds back with client secret to AM.
> 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
> received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk

2014-10-07 Thread Ted Yu (JIRA)
Ted Yu created YARN-2650:


 Summary: TestRMRestart#testRMRestartGetApplicationList sometimes 
fails in trunk
 Key: YARN-2650
 URL: https://issues.apache.org/jira/browse/YARN-2650
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I got the following failure running on Linux:
{code}
  TestRMRestart.testRMRestartGetApplicationList:952
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
-> at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952)
But was 2 times:
-> at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162076#comment-14162076
 ] 

Hadoop QA commented on YARN-1915:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673366/YARN-1915v3.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5306//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5306//console

This message is automatically generated.

> ClientToAMTokenMasterKey should be provided to AM at launch time
> 
>
> Key: YARN-1915
> URL: https://issues.apache.org/jira/browse/YARN-1915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Hitesh Shah
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch
>
>
> Currently, the AM receives the key as part of registration. This introduces a 
> race where a client can connect to the AM when the AM has not received the 
> key. 
> Current Flow:
> 1) AM needs to start the client listening service in order to get host:port 
> and send it to the RM as part of registration
> 2) RM gets the port info in register() and transitions the app to RUNNING. 
> Responds back with client secret to AM.
> 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
> received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2331:
-
Attachment: YARN-2331.patch

In the interest of getting something done for this in time for 2.6, here's a 
patch that adds a conf to tell the NM whether or not it's supervised.  If 
supervised then it is expected to be quickly restarted on shutdown and will not 
kill containers.  If unsupervised then it will kill containers on shutdown 
since it does not expect to be restarted in a timely manner.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> 
>
> Key: YARN-2331
> URL: https://issues.apache.org/jira/browse/YARN-2331
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
> Attachments: YARN-2331.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios 
> we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be 
> preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being 
> performed.  In that case the shutdown should kill all containers since it is 
> unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being 
> performed.  In that case the shutdown should not kill all containers since a 
> restart is imminent due to the rolling upgrade and the containers will be 
> recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.14.patch

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1915:
-
Attachment: YARN-1915v3.patch

Refreshed patch to latest trunk.

[~vinodkv] could you comment?  I fully agree with Hitesh that the current patch 
is a stop-gap at best.  However there's some confusion as to how  the client 
token master key should be sent to the RM (e.g.: via container credentials, via 
the current method, etc.).  The original env variable approach apparently is 
problematic on Windows per YARN-610.

If we won't have time to develop the best fix for 2.6 then I'd like to see 
something like this patch put in to improve things in the interim.

> ClientToAMTokenMasterKey should be provided to AM at launch time
> 
>
> Key: YARN-1915
> URL: https://issues.apache.org/jira/browse/YARN-1915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Hitesh Shah
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch
>
>
> Currently, the AM receives the key as part of registration. This introduces a 
> race where a client can connect to the AM when the AM has not received the 
> key. 
> Current Flow:
> 1) AM needs to start the client listening service in order to get host:port 
> and send it to the RM as part of registration
> 2) RM gets the port info in register() and transitions the app to RUNNING. 
> Responds back with client secret to AM.
> 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
> received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161969#comment-14161969
 ] 

Hudson commented on YARN-2615:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* hadoop-yarn-project/CHANGES.txt


> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
> YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161971#comment-14161971
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/])
Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786)
* hadoop-yarn-project/CHANGES.txt


> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> --
>
> Key: YARN-1051
> URL: https://issues.apache.org/jira/browse/YARN-1051
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161970#comment-14161970
 ] 

Hudson commented on YARN-2644:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* hadoop-yarn-project/CHANGES.txt


> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161913#comment-14161913
 ] 

Hudson commented on YARN-2644:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161914#comment-14161914
 ] 

Hudson commented on YARN-1051:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/])
Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786)
* hadoop-yarn-project/CHANGES.txt


> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> --
>
> Key: YARN-1051
> URL: https://issues.apache.org/jira/browse/YARN-1051
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161912#comment-14161912
 ] 

Hudson commented on YARN-2615:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto


> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
> YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161902#comment-14161902
 ] 

Hadoop QA commented on YARN-2312:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673331/YARN-2312.6.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5304//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5304//console

This message is automatically generated.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
> YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161826#comment-14161826
 ] 

Hadoop QA commented on YARN-1879:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673336/YARN-1879.23.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5305//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5305//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161793#comment-14161793
 ] 

Hudson commented on YARN-2644:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/704/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161794#comment-14161794
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/704/])
Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786)
* hadoop-yarn-project/CHANGES.txt


> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> --
>
> Key: YARN-1051
> URL: https://issues.apache.org/jira/browse/YARN-1051
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161792#comment-14161792
 ] 

Hudson commented on YARN-2615:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/704/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java


> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
> YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161781#comment-14161781
 ] 

Rohith commented on YARN-2579:
--

Verified manually for below tests using help of eclipse debug point.
1. Call transitionToStandBy from admin service obtaining RM lock, and at same 
time RMFatalEventDispatcher wait for RM lock to transtionToStandBy(This issue 
scenario)
2. Call transitionToStandBy from RMFatalEventDispatcher obtaining RM lock, and 
at same time admin service wait for RM lock to transtionToStandBy.

Please review patch.

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161774#comment-14161774
 ] 

Rohith commented on YARN-2579:
--

Considering 1 st approach as feasible, I attached patch. Thinking of how do I 
write tests!!




> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2579:
-
Attachment: YARN-2579.patch

> Both RM's state is Active , but 1 RM is not really active.
> --
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.23.patch

The failure of TestClientToAMTokens looks not related - it passed on my local. 
Let me submit same patch again.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2312:
-
Attachment: YARN-2312.6.patch

Thanks Jason and Jian for review. Updated:

* Removed unnecessary change in TestTaskAttemptListenerImpl.java - sorry for 
this, this change was included wrongly.
* Defined 0xffL as CONTAINER_ID_BITMASK and exposed it.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
> YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161724#comment-14161724
 ] 

Tsuyoshi OZAWA commented on YARN-1458:
--

Sure, thanks for reply :-)

> FairScheduler: Zero weight can lead to livelock
> ---
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
> YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
> YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
> yarn-1458-7.patch, yarn-1458-8.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161664#comment-14161664
 ] 

Hadoop QA commented on YARN-2641:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673311/YARN-2641.001.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5303//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5303//console

This message is automatically generated.

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161660#comment-14161660
 ] 

Junping Du commented on YARN-2641:
--

The idea here sounds interestingHowever, the decommission still happen 
after the heartbeat back to NM. AM I missing something here?

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161658#comment-14161658
 ] 

Hadoop QA commented on YARN-2641:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673309/YARN-2641.000.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5302//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5302//console

This message is automatically generated.

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.001.patch

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.001.patch)

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.001.patch

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.001.patch)

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.001.patch

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.000.patch

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.000.patch)

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >