[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777171#comment-13777171
 ] 

Hadoop QA commented on YARN-311:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604962/YARN-311-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2012//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2012//console

This message is automatically generated.

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
> YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1235) Regulate the case of applicationType

2013-09-24 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1235:
-

 Summary: Regulate the case of applicationType
 Key: YARN-1235
 URL: https://issues.apache.org/jira/browse/YARN-1235
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-1001, when filtering applications, we ignore the case of the 
applicationType.

However, RMClientService#getApplications doesn't. Moreover, it is not 
documented that ApplicationClientProtocol ignores the case of applicationType 
or not.

IMHO, we need to do:

1. Modify RMClientService#getApplications to ignore the case of applicationType 
when filtering applications

2. Add javadoc in ApplicationClientProtocol#submitApplication and 
getApplications to say that applicationType is case insensitive

3. Probably, when submitApplication, we'd like to "normalize" the 
applicationType to the lower case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:


Attachment: YARN-311-v7.patch

Sync up patch with Trunk in v7 patch.

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
> YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy

2013-09-24 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-1028:
---

Assignee: Karthik Kambatla  (was: Devaraj K)

> Add FailoverProxyProvider like capability to RMProxy
> 
>
> Key: YARN-1028
> URL: https://issues.apache.org/jira/browse/YARN-1028
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
>
> RMProxy layer currently abstracts RM discovery and implements it by looking 
> up service information from configuration. Motivated by HDFS and using 
> existing classes from Common, we can add failover proxy providers that may 
> provide RM discovery in extensible ways.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1215) Yarn URL should include userinfo

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777124#comment-13777124
 ] 

Hadoop QA commented on YARN-1215:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12604945/YARN-1215-trunk.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2011//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2011//console

This message is automatically generated.

> Yarn URL should include userinfo
> 
>
> Key: YARN-1215
> URL: https://issues.apache.org/jira/browse/YARN-1215
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch
>
>
> In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
> userinfo as part of the URL. When converting a {{java.net.URI}} object into 
> the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
> set uri host as the url host. If the uri has a userinfo part, the userinfo is 
> discarded. This will lead to information loss if the original uri has the 
> userinfo, e.g. foo://username:passw...@example.com will be converted to 
> foo://example.com and username/password information is lost during the 
> conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-09-24 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777082#comment-13777082
 ] 

Ravi Prakash commented on YARN-90:
--

Hi nijel!

Welcome to the community and thanks for your contribution. A few comments:
1. Nit: Some lines are over 80 characters long.
2. numFailures is never incremented any more when the directory fails. Thus 
getNumFailures() would return the wrong result.

Could you please also tell us how you tested the patch? There seem to be a lot 
of unit tests which use LocalDirsHandlerService. Did you run them all and 
ensure that they still all pass?

Thanks again

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
> Attachments: YARN-90.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1128) FifoPolicy.computeShares throws NPE on empty list of Schedulables

2013-09-24 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1128:
-

Hadoop Flags: Reviewed

Committed to trunk, branch-2, and branch-2.1-beta

> FifoPolicy.computeShares throws NPE on empty list of Schedulables
> -
>
> Key: YARN-1128
> URL: https://issues.apache.org/jira/browse/YARN-1128
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Karthik Kambatla
> Fix For: 2.1.2-beta
>
> Attachments: yarn-1128-1.patch
>
>
> FifoPolicy gives all of a queue's share to the earliest-scheduled application.
> {code}
> Schedulable earliest = null;
> for (Schedulable schedulable : schedulables) {
>   if (earliest == null ||
>   schedulable.getStartTime() < earliest.getStartTime()) {
> earliest = schedulable;
>   }
> }
> earliest.setFairShare(Resources.clone(totalResources));
> {code}
> If the queue has no schedulables in it, earliest will be left null, leading 
> to an NPE on the last line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores

2013-09-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777071#comment-13777071
 ] 

Sandy Ryza commented on YARN-1089:
--

As was requested, I posted a summary of the proposal on YARN-1024.

In case it's not clear on the summary, here's the problem we're trying to solve:
We want jobs to be portable between clusters. CPU is not a fluid resource in 
the way memory is. The number of cores on a machine is just as important its 
total processing power when scheduling tasks.

Imagine a cluster where every node has powerful CPUs with many cores.  One type 
of task that will be run on the cluster saturates a full CPU, but another type 
of task that will be run on the cluster contains two threads, each which can 
saturate only half a full CPU.  If we have a single dimension for CPU requests, 
these tasks will request an equal number of those.  What happens if we then 
move those tasks to a cluster with CPUs whose cores are half as fast?  The 
first task will run half as fast, and the second task will run in the same 
amount of time.  It's in the first task's interest to only request half as many 
CPU resources on that cluster.

I'm also afraid of things getting complicated, but I can't think of anything 
better that doesn't require having the meaning of a virtual core vary widely 
from cluster to cluster.

> Add YARN compute units alongside virtual cores
> --
>
> Key: YARN-1089
> URL: https://issues.apache.org/jira/browse/YARN-1089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1089-1.patch, YARN-1089.patch
>
>
> Based on discussion in YARN-1024, we will add YARN compute units as a 
> resource for requesting and scheduling CPU processing power.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-899) Get queue administration ACLs working

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777069#comment-13777069
 ] 

Hadoop QA commented on YARN-899:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604938/YARN-899.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2010//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2010//console

This message is automatically generated.

> Get queue administration ACLs working
> -
>
> Key: YARN-899
> URL: https://issues.apache.org/jira/browse/YARN-899
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Xuan Gong
> Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, 
> YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, 
> YARN-899.7.patch, YARN-899.8.patch
>
>
> The Capacity Scheduler documents the 
> yarn.scheduler.capacity.root..acl_administer_queue config option 
> for controlling who can administer a queue, but it is not hooked up to 
> anything.  The Fair Scheduler could make use of a similar option as well.  
> This is a feature-parity regression from MR1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777068#comment-13777068
 ] 

Junping Du commented on YARN-311:
-

Hi [~acmurthy], it has been a while. Any chance to review this patch?

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
> YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1215) Yarn URL should include userinfo

2013-09-24 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-1215:


Attachment: YARN-1215-trunk.2.patch

Attach a new patch that adds a userInfo field for 
org.apache.hadoop.yarn.api.records.URL. This appends an optional filed to 
existing .proto file. This is allowed according to compatibility guide at:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

> Yarn URL should include userinfo
> 
>
> Key: YARN-1215
> URL: https://issues.apache.org/jira/browse/YARN-1215
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch
>
>
> In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
> userinfo as part of the URL. When converting a {{java.net.URI}} object into 
> the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
> set uri host as the url host. If the uri has a userinfo part, the userinfo is 
> discarded. This will lead to information loss if the original uri has the 
> userinfo, e.g. foo://username:passw...@example.com will be converted to 
> foo://example.com and username/password information is lost during the 
> conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved

2013-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777056#comment-13777056
 ] 

Hudson commented on YARN-1214:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4464 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4464/])
YARN-1214. Register ClientToken MasterKey in SecretManager after it is saved 
(Jian He via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1526078)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenSecretManagerInRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java


> Register ClientToken MasterKey in SecretManager after it is saved
> -
>
> Key: YARN-1214
> URL: https://issues.apache.org/jira/browse/YARN-1214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, 
> YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch
>
>
> Currently, app attempt ClientToken master key is registered before it is 
> saved. This can cause problem that before the master key is saved, client 
> gets the token and RM also crashes, RM cannot reloads the master key back 
> after it restarts as it is not saved. As a result, client is holding an 
> invalid token.
> We can register the client token master key after it is saved in the store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores

2013-09-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777051#comment-13777051
 ] 

Bikas Saha commented on YARN-1089:
--

At this point, I am not seeing the benefit of creating yet another cpu related 
configuration. While I am not against useful configurations, its already hard 
to configure YARN. Like Vinod and others said, can a summary of the discussions 
made elsewhere be placed here.

> Add YARN compute units alongside virtual cores
> --
>
> Key: YARN-1089
> URL: https://issues.apache.org/jira/browse/YARN-1089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1089-1.patch, YARN-1089.patch
>
>
> Based on discussion in YARN-1024, we will add YARN compute units as a 
> resource for requesting and scheduling CPU processing power.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-09-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777044#comment-13777044
 ] 

Jian He commented on YARN-674:
--

Is this related to ClientRMService.renewDelegationToken this method?

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores

2013-09-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777034#comment-13777034
 ] 

Sandy Ryza commented on YARN-1089:
--

I'm ok with with waiting until 2.3.  In case it's not clear, the consequence of 
this is that until then it will be impossible to place more tasks on a node 
than its number of virtual cores, which is essentially its number of physical 
cores.

I think we should make YARN-976, documenting the meaning of vcores, a blocker 
for 2.2.

> Add YARN compute units alongside virtual cores
> --
>
> Key: YARN-1089
> URL: https://issues.apache.org/jira/browse/YARN-1089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1089-1.patch, YARN-1089.patch
>
>
> Based on discussion in YARN-1024, we will add YARN compute units as a 
> resource for requesting and scheduling CPU processing power.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-899) Get queue administration ACLs working

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-899:
---

Attachment: YARN-899.8.patch

> Get queue administration ACLs working
> -
>
> Key: YARN-899
> URL: https://issues.apache.org/jira/browse/YARN-899
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Xuan Gong
> Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, 
> YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, 
> YARN-899.7.patch, YARN-899.8.patch
>
>
> The Capacity Scheduler documents the 
> yarn.scheduler.capacity.root..acl_administer_queue config option 
> for controlling who can administer a queue, but it is not hooked up to 
> anything.  The Fair Scheduler could make use of a similar option as well.  
> This is a feature-parity regression from MR1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-899) Get queue administration ACLs working

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-899:
---

Attachment: YARN-899.7.patch

create patch based on the latest trunk

> Get queue administration ACLs working
> -
>
> Key: YARN-899
> URL: https://issues.apache.org/jira/browse/YARN-899
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Xuan Gong
> Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, 
> YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, 
> YARN-899.7.patch
>
>
> The Capacity Scheduler documents the 
> yarn.scheduler.capacity.root..acl_administer_queue config option 
> for controlling who can administer a queue, but it is not hooked up to 
> anything.  The Fair Scheduler could make use of a similar option as well.  
> This is a feature-parity regression from MR1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777023#comment-13777023
 ] 

Hadoop QA commented on YARN-1229:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604922/YARN-1229.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2008//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2008//console

This message is automatically generated.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1232) Configuration support for RM HA

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777021#comment-13777021
 ] 

Hadoop QA commented on YARN-1232:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604931/yarn-1232-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.cli.TestYarnCLI
  org.apache.hadoop.yarn.client.TestGetGroups
  org.apache.hadoop.yarn.client.api.impl.TestYarnClient
  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
  org.apache.hadoop.yarn.client.api.impl.TestNMClient
  org.apache.hadoop.yarn.conf.TestYarnConfiguration
  org.apache.hadoop.yarn.logaggregation.TestLogDumper
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMNodeTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice.TestApplicationMasterService
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics
  
org.apache.hadoop.yarn.server.resource

[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777009#comment-13777009
 ] 

Hudson commented on YARN-1229:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4463 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4463/])
YARN-1229. Define constraints on Auxiliary Service names. Change ShuffleHandler 
service name from mapreduce.shuffle to mapreduce_shuffle. Contributed by Xuan 
Gong. (sseth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1526065)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/INSTALL
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java


> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster

2013-09-24 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1234:


Fix Version/s: 2.1.2-beta

>  Container localizer logs are not created in secured cluster
> 
>
> Key: YARN-1234
> URL: https://issues.apache.org/jira/browse/YARN-1234
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.2-beta
>
>
> When we are running ContainerLocalizer in secured cluster we potentially are 
> not creating any log file to track log messages. This will be helpful in 
> potentially identifying ContainerLocalization issues in secured cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster

2013-09-24 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1234:


Component/s: nodemanager

>  Container localizer logs are not created in secured cluster
> 
>
> Key: YARN-1234
> URL: https://issues.apache.org/jira/browse/YARN-1234
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.2-beta
>
>
> When we are running ContainerLocalizer in secured cluster we potentially are 
> not creating any log file to track log messages. This will be helpful in 
> potentially identifying ContainerLocalization issues in secured cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-1234) Container localizer logs are not created in secured cluster

2013-09-24 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi moved MAPREDUCE-5532 to YARN-1234:


Key: YARN-1234  (was: MAPREDUCE-5532)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

>  Container localizer logs are not created in secured cluster
> 
>
> Key: YARN-1234
> URL: https://issues.apache.org/jira/browse/YARN-1234
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> When we are running ContainerLocalizer in secured cluster we potentially are 
> not creating any log file to track log messages. This will be helpful in 
> potentially identifying ContainerLocalization issues in secured cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1232) Configuration support for RM HA

2013-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1232:
---

Attachment: yarn-1232-2.patch

Patch that adds descriptions and tests HAUtil, and to be applied on trunk.

> Configuration support for RM HA
> ---
>
> Key: YARN-1232
> URL: https://issues.apache.org/jira/browse/YARN-1232
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1232-1.patch, yarn-1232-2.patch
>
>
> We should augment the configuration to allow users specify two RMs and the 
> individual RPC addresses for them. This blocks 
> ConfiguredFailoverProxyProvider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776979#comment-13776979
 ] 

Siddharth Seth commented on YARN-1229:
--

+1. Committing.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776975#comment-13776975
 ] 

Hadoop QA commented on YARN-1229:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604917/YARN-1229.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2007//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2007//console

This message is automatically generated.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1229:


Attachment: YARN-1229.6.patch

fix documentation

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776945#comment-13776945
 ] 

Siddharth Seth commented on YARN-1229:
--

Patch looks good. Missed this earlier, but there's several references to 
mapreduce.shuffle in documentation which need to be updated.
Also, since it's being updated - can you make the Pattern final. Thanks

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776934#comment-13776934
 ] 

Hadoop QA commented on YARN-1229:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604911/YARN-1229.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2006//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2006//console

This message is automatically generated.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1229:


Attachment: YARN-1229.5.patch

1.Change to mapreduce_shuffle
2. using regex for checking auxName

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch, YARN-1229.5.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1168) Cannot run "echo \"Hello World\""

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1168:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> Cannot run "echo \"Hello World\""
> -
>
> Key: YARN-1168
> URL: https://issues.apache.org/jira/browse/YARN-1168
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Priority: Critical
> Fix For: 2.1.2-beta
>
>
> Run
> $ ssh localhost "echo \"Hello World\""
> with bash does succeed. Hello World is shown in stdout.
> Run distributed shell with similar echo command. That is either
> $ /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
> -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.*.jar 
> -shell_command echo -shell_args "\"Hello World\""
> or
> $ /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
> -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.*.jar 
> -shell_command echo -shell_args "Hello World"
> {code:title=yarn logs -- only hello is shown}
> LogType: stdout
> LogLength: 6
> Log Contents:
> hello
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1149:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> NM throws InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
> -
>
> Key: YARN-1149
> URL: https://issues.apache.org/jira/browse/YARN-1149
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ramya Sunil
>Assignee: Xuan Gong
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
> YARN-1149.4.patch
>
>
> When nodemanager receives a kill signal when an application has finished 
> execution but log aggregation has not kicked in, 
> InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
> finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
> log-file for app application_1377459190746_0118 at 
> /app-logs/foo/logs/application_1377459190746_0118/_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
> to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
> container container_1377459190746_0118_01_04. Current good log dirs are 
> /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
> log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application 
> (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>  
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
> at java.lang.Thread.run(Thread.java:662)
> 2013-08-25 20:45:00,926 INFO  application.Application 
> (ApplicationImpl.java:handle(430)) - Application 
> application_1377459190746_0118 transitioned from RUNNING to null
> 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(463)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 8040
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1157:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> ResourceManager UI has invalid tracking URL link for distributed shell 
> application
> --
>
> Key: YARN-1157
> URL: https://issues.apache.org/jira/browse/YARN-1157
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, 
> YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch
>
>
> Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
> application definitely appears. In Tracking UI column, there will be history 
> link. Click on that link. Instead of showing application master web UI, HTTP 
> error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1142:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> MiniYARNCluster web ui does not work properly
> -
>
> Key: YARN-1142
> URL: https://issues.apache.org/jira/browse/YARN-1142
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
> Fix For: 2.1.2-beta
>
>
> When going to the RM http port, the NM web ui is displayed. It seems there is 
> a singleton somewhere that breaks things when RM & NMs run in the same 
> process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1131:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> $ yarn logs should return a message log aggregation is during progress if 
> YARN application is running
> -
>
> Key: YARN-1131
> URL: https://issues.apache.org/jira/browse/YARN-1131
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Tassapol Athiapinya
>Assignee: Junping Du
>Priority: Minor
> Fix For: 2.1.2-beta
>
>
> In the case when log aggregation is enabled, if a user submits MapReduce job 
> and runs $ yarn logs -applicationId  while the YARN application is 
> running, the command will return no message and return user back to shell. It 
> is nice to tell the user that log aggregation is in progress.
> {code}
> -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
> -bash-4.1$
> {code}
> At the same time, if invalid application ID is given, YARN CLI should say 
> that the application ID is incorrect rather than throwing 
> NoSuchElementException.
> {code}
> $ /usr/bin/yarn logs -applicationId application_0
> Exception in thread "main" java.util.NoSuchElementException
> at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
> at 
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
> at 
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
> at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
> at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1158) ResourceManager UI has application stdout missing if application stdout is not in the same directory as AppMaster stdout

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1158:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> ResourceManager UI has application stdout missing if application stdout is 
> not in the same directory as AppMaster stdout
> 
>
> Key: YARN-1158
> URL: https://issues.apache.org/jira/browse/YARN-1158
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
> Fix For: 2.1.2-beta
>
>
> Configure yarn-site.xml's yarn.nodemanager.local-dirs to multiple 
> directories. Turn on log aggregation. Run distributed shell application. If 
> an application writes AppMaster.stdout in one directory and stdout in another 
> directory. Goto ResourceManager web UI. Open up container logs. Only 
> AppMaster.stdout would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1121:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
> Fix For: 2.1.2-beta
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776915#comment-13776915
 ] 

Siddharth Seth commented on YARN-1229:
--

Took a quick look.
- Can you please rename MapreduceShuffle to mapreduce_shuffle (closer to the 
old name)
- The check can be regex based, rather than walking through all the characters.
- Include an empty check along with the null check

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1167:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> Submitted distributed shell application shows appMasterHost = empty
> ---
>
> Key: YARN-1167
> URL: https://issues.apache.org/jira/browse/YARN-1167
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
> Fix For: 2.1.2-beta
>
>
> Submit distributed shell application. Once the application turns to be 
> RUNNING state, app master host should not be empty. In reality, it is empty.
> ==console logs==
> distributedshell.Client: Got application report from ASM for, appId=12, 
> clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
> appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
> distributedFinalState=UNDEFINED, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1022) Unnecessary INFO logs in AMRMClientAsync

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1022:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> Unnecessary INFO logs in AMRMClientAsync
> 
>
> Key: YARN-1022
> URL: https://issues.apache.org/jira/browse/YARN-1022
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bikas Saha
>Priority: Minor
>  Labels: newbie
> Fix For: 2.1.2-beta
>
>
> Logs like the following should be debug or else every legitimate stop causes 
> unnecessary exception traces in the logs.
> 464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl:
> Heartbeater interrupted
> 465 java.lang.InterruptedException: sleep interrupted
> 466   at java.lang.Thread.sleep(Native Method)
> 467   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249)
> 468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl:   
> Interrupted while waiting for queue
> 469 java.lang.InterruptedException
> 470   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.
>  java:1961)
> 471   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
> 472   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> 473   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1053:


Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
> --
>
> Key: YARN-1053
> URL: https://issues.apache.org/jira/browse/YARN-1053
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>  Labels: newbie
> Fix For: 2.3.0, 2.1.2-beta
>
> Attachments: YARN-1053.20130809.patch
>
>
> If the container launch fails then we send ContainerExitEvent. This event 
> contains exitCode and diagnostic message. Today we are ignoring diagnostic 
> message while handling this event inside ContainerImpl. Fixing it as it is 
> useful in diagnosing the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1229:


Attachment: YARN-1229.4.patch

Allow _ as valid character in auxServiceName, and disallow auxServiceName 
starting at number

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
> YARN-1229.4.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved

2013-09-24 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1214:
-

Priority: Critical  (was: Major)

> Register ClientToken MasterKey in SecretManager after it is saved
> -
>
> Key: YARN-1214
> URL: https://issues.apache.org/jira/browse/YARN-1214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, 
> YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch
>
>
> Currently, app attempt ClientToken master key is registered before it is 
> saved. This can cause problem that before the master key is saved, client 
> gets the token and RM also crashes, RM cannot reloads the master key back 
> after it restarts as it is not saved. As a result, client is holding an 
> invalid token.
> We can register the client token master key after it is saved in the store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776872#comment-13776872
 ] 

Chris Nauroth commented on YARN-1229:
-

Agreed on underscores.  Various resources indicate that 
{{[a-zA-Z_]+[a-zA-Z0-9_]*}} is a good format that we can expect to work 
cross-platform.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1128) FifoPolicy.computeShares throws NPE on empty list of Schedulables

2013-09-24 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1128:
-

Fix Version/s: 2.1.2-beta

> FifoPolicy.computeShares throws NPE on empty list of Schedulables
> -
>
> Key: YARN-1128
> URL: https://issues.apache.org/jira/browse/YARN-1128
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Karthik Kambatla
> Fix For: 2.1.2-beta
>
> Attachments: yarn-1128-1.patch
>
>
> FifoPolicy gives all of a queue's share to the earliest-scheduled application.
> {code}
> Schedulable earliest = null;
> for (Schedulable schedulable : schedulables) {
>   if (earliest == null ||
>   schedulable.getStartTime() < earliest.getStartTime()) {
> earliest = schedulable;
>   }
> }
> earliest.setFairShare(Resources.clone(totalResources));
> {code}
> If the queue has no schedulables in it, earliest will be left null, leading 
> to an NPE on the last line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1203) Application Manager UI does not appear with Https enabled

2013-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1203:
--

Fix Version/s: 2.1.2-beta

> Application Manager UI does not appear with Https enabled
> -
>
> Key: YARN-1203
> URL: https://issues.apache.org/jira/browse/YARN-1203
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1203.20131017.1.patch, YARN-1203.20131017.2.patch, 
> YARN-1203.20131017.3.patch, YARN-1203.20131018.1.patch, 
> YARN-1203.20131018.2.patch, YARN-1203.20131019.1.patch
>
>
> Need to add support to disable 'hadoop.ssl.enabled' for MR jobs.
> A job should be able to run on http protocol by setting 'hadoop.ssl.enabled' 
> property at job level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776865#comment-13776865
 ] 

Siddharth Seth commented on YARN-1229:
--

Just looked at the patch, it'd be nice to include underscores as well - 
provides for a separator in the allowed character set.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1204) Need to add https port related property in Yarn

2013-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1204:
--

Fix Version/s: 2.1.2-beta

> Need to add https port related property in Yarn
> ---
>
> Key: YARN-1204
> URL: https://issues.apache.org/jira/browse/YARN-1204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, 
> YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, 
> YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch
>
>
> There is no yarn property available to configure https port for Resource 
> manager, nodemanager and history server. Currently, Yarn services uses the 
> port defined for http [defined by 
> 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 
> 'yarn.resourcemanager.webapp.address'] for running services on https protocol.
> Yarn should have list of property to assign https port for RM, NM and JHS.
> It can be like below.
> yarn.nodemanager.webapp.https.address
> yarn.resourcemanager.webapp.https.address
> mapreduce.jobhistory.webapp.https.address 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1229:
-

Fix Version/s: (was: 2.1.1-beta)
   2.1.2-beta

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776852#comment-13776852
 ] 

Vinod Kumar Vavilapalli commented on YARN-1229:
---

*sigh* more incompatible changes. Thought for a while if we can do it in a 
compatible manner, but doesn't seem like there is any way.

Looked at the patch, +1 for the changes. Let's get it in asap.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-09-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776847#comment-13776847
 ] 

Carlo Curino commented on YARN-624:
---

Hi Guys,

I would like to quantify what is the typical waste of resources while 
"hoarding" containers towards a gang for Gyraph or Storm. 
Anyone have an intuition/measure of the typical time-delay and container 
slot-time wasted while hoarding containers, before the 
useful part of the computation starts?  Thanks.. 


> Support gang scheduling in the AM RM protocol
> -
>
> Key: YARN-624
> URL: https://issues.apache.org/jira/browse/YARN-624
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, scheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
> scheduler runs a set of tasks when they can all be run at the same time, 
> would be a useful feature for YARN schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they 
> get all the ones they need.  However, this lends itself to deadlocks when 
> different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776843#comment-13776843
 ] 

Hadoop QA commented on YARN-1214:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604886/YARN-1214.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2005//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2005//console

This message is automatically generated.

> Register ClientToken MasterKey in SecretManager after it is saved
> -
>
> Key: YARN-1214
> URL: https://issues.apache.org/jira/browse/YARN-1214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, 
> YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch
>
>
> Currently, app attempt ClientToken master key is registered before it is 
> saved. This can cause problem that before the master key is saved, client 
> gets the token and RM also crashes, RM cannot reloads the master key back 
> after it restarts as it is not saved. As a result, client is holding an 
> invalid token.
> We can register the client token master key after it is saved in the store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved

2013-09-24 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1214:
--

Attachment: YARN-1214.6.patch

patch rebased

> Register ClientToken MasterKey in SecretManager after it is saved
> -
>
> Key: YARN-1214
> URL: https://issues.apache.org/jira/browse/YARN-1214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, 
> YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch
>
>
> Currently, app attempt ClientToken master key is registered before it is 
> saved. This can cause problem that before the master key is saved, client 
> gets the token and RM also crashes, RM cannot reloads the master key back 
> after it restarts as it is not saved. As a result, client is holding an 
> invalid token.
> We can register the client token master key after it is saved in the store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved

2013-09-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776802#comment-13776802
 ] 

Bikas Saha commented on YARN-1214:
--

+1

> Register ClientToken MasterKey in SecretManager after it is saved
> -
>
> Key: YARN-1214
> URL: https://issues.apache.org/jira/browse/YARN-1214
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, 
> YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.patch
>
>
> Currently, app attempt ClientToken master key is registered before it is 
> saved. This can cause problem that before the master key is saved, client 
> gets the token and RM also crashes, RM cannot reloads the master key back 
> after it restarts as it is not saved. As a result, client is holding an 
> invalid token.
> We can register the client token master key after it is saved in the store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776765#comment-13776765
 ] 

Bikas Saha commented on YARN-1229:
--

Looks good to me.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application

2013-09-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776684#comment-13776684
 ] 

Jian He commented on YARN-1157:
---

Tests look much clean, thanks for the update, patch looks good, + 1

> ResourceManager UI has invalid tracking URL link for distributed shell 
> application
> --
>
> Key: YARN-1157
> URL: https://issues.apache.org/jira/browse/YARN-1157
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, 
> YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch
>
>
> Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
> application definitely appears. In Tracking UI column, there will be history 
> link. Click on that link. Instead of showing application master web UI, HTTP 
> error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776663#comment-13776663
 ] 

Hadoop QA commented on YARN-1157:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604859/YARN-1157.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2004//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2004//console

This message is automatically generated.

> ResourceManager UI has invalid tracking URL link for distributed shell 
> application
> --
>
> Key: YARN-1157
> URL: https://issues.apache.org/jira/browse/YARN-1157
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, 
> YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch
>
>
> Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
> application definitely appears. In Tracking UI column, there will be history 
> link. Click on that link. Instead of showing application master web UI, HTTP 
> error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776653#comment-13776653
 ] 

Karthik Kambatla commented on YARN-1068:


Thanks [~bikassaha], agree with most of your points.

bq. AdminService does not use the HAServiceProtocolServerSideTranslatorPB 
pattern
The reason for this is our attempt to reuse most of the common code - protos 
and client implementations.

bq. Having thought about this, it seems to me that this jira is actually 
blocked by YARN-986.
To fix the admin support in entirety, I agree that we need YARN-1232 and 
YARN-986. That said, for ease of development, I would propose splitting the 
admin support into two parts (JIRAs) - basic support (this JIRA) to go in first 
to help testing YARN-1232 and YARN-986, and complete admin support that adds 
the remaining parts. Otherwise, we need applying this over those other JIRAs to 
test. Thoughts?



> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-986) YARN should have a ClusterId/ServiceId

2013-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-986:


Assignee: Karthik Kambatla  (was: Vinod Kumar Vavilapalli)

Sure, here you go.

> YARN should have a ClusterId/ServiceId
> --
>
> Key: YARN-986
> URL: https://issues.apache.org/jira/browse/YARN-986
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Karthik Kambatla
>
> This needs to be done to support non-ip based fail over of RM. Once the 
> server sets the token service address to be this generic ClusterId/ServiceId, 
> clients can translate it to appropriate final IP and then be able to select 
> tokens via TokenSelectors.
> Some workarounds for other related issues were put in place at YARN-945.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-986) YARN should have a ClusterId/ServiceId

2013-09-24 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-986:


Summary: YARN should have a ClusterId/ServiceId  (was: YARN should have a 
ClusterId/ServiceId that should be used to set the service address for tokens)

> YARN should have a ClusterId/ServiceId
> --
>
> Key: YARN-986
> URL: https://issues.apache.org/jira/browse/YARN-986
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> This needs to be done to support non-ip based fail over of RM. Once the 
> server sets the token service address to be this generic ClusterId/ServiceId, 
> clients can translate it to appropriate final IP and then be able to select 
> tokens via TokenSelectors.
> Some workarounds for other related issues were put in place at YARN-945.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-986) YARN should have a ClusterId/ServiceId

2013-09-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776637#comment-13776637
 ] 

Bikas Saha commented on YARN-986:
-

This should be used to set the service address for tokens. This would also be 
needed to pick up the correct configs for HA scenarios.

> YARN should have a ClusterId/ServiceId
> --
>
> Key: YARN-986
> URL: https://issues.apache.org/jira/browse/YARN-986
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> This needs to be done to support non-ip based fail over of RM. Once the 
> server sets the token service address to be this generic ClusterId/ServiceId, 
> clients can translate it to appropriate final IP and then be able to select 
> tokens via TokenSelectors.
> Some workarounds for other related issues were put in place at YARN-945.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-09-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776634#comment-13776634
 ] 

Bikas Saha commented on YARN-1068:
--

It would be educative to compare the HAAdmin server start code with existing 
admin RM server like the AdminService. I notice 2 things. 
1) AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern
2) AdminService does something with HADOOP_SECURITY_AUTHORIZATION which is 
missing in HAAdminService. This probably defines who has access to perform the 
admin operations. We will likely need that for the HAAdmin right?

Having thought about this, it seems to me that this jira is actually blocked by 
YARN-986. Without a concept of a logical name how can we expect the CLI etc to 
find the correct RM address from configuration? The client conf files would be 
expected to have entries for all RM instances and we would need to be able to 
issue admin commands to any one of them. So we need to be able to address them 
via a logical name, right? So the current approach that picks the 
RM_HA_ADMIN_SERVICE address does not seem like a viable solution. Similarly, 
server conf files would need to tell the server what its logical name is so 
that it can try to pick and instance specific configurations. This is precisely 
why we have the HAAdmin.resolveTarget() method.
Again, it would be educative to look at NNHAServiceTarget for client side and 
the constructor for NameNode where is uses the logical name to translate and 
re-write server side conf.

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1157:


Attachment: YARN-1157.6.patch

Adding more comments in RegisterApplicationMasterRequest and 
FinishApplicationMasterRequest

> ResourceManager UI has invalid tracking URL link for distributed shell 
> application
> --
>
> Key: YARN-1157
> URL: https://issues.apache.org/jira/browse/YARN-1157
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, 
> YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch
>
>
> Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
> application definitely appears. In Tracking UI column, there will be history 
> link. Click on that link. Instead of showing application master web UI, HTTP 
> error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776629#comment-13776629
 ] 

Hadoop QA commented on YARN-1157:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604851/YARN-1157.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2003//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2003//console

This message is automatically generated.

> ResourceManager UI has invalid tracking URL link for distributed shell 
> application
> --
>
> Key: YARN-1157
> URL: https://issues.apache.org/jira/browse/YARN-1157
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, 
> YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch
>
>
> Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
> application definitely appears. In Tracking UI column, there will be history 
> link. Click on that link. Instead of showing application master web UI, HTTP 
> error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776628#comment-13776628
 ] 

Alejandro Abdelnur commented on YARN-1021:
--

+1

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776598#comment-13776598
 ] 

Hadoop QA commented on YARN-1229:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604849/YARN-1229.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2002//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2002//console

This message is automatically generated.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1157:


Attachment: YARN-1157.5.patch

create the patch based on the latest trunk

> ResourceManager UI has invalid tracking URL link for distributed shell 
> application
> --
>
> Key: YARN-1157
> URL: https://issues.apache.org/jira/browse/YARN-1157
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, 
> YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch
>
>
> Submit YARN distributed shell application. Goto ResourceManager Web UI. The 
> application definitely appears. In Tracking UI column, there will be history 
> link. Click on that link. Instead of showing application master web UI, HTTP 
> error 500 would appear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1229:


Attachment: YARN-1229.3.patch

Changed the NM_AUX_SERVICE prefix to NodeManagerAuxService to eliminate the "_"

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1233) NodeManager doesn't renew krb5 creds

2013-09-24 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-1233:
--

 Summary: NodeManager doesn't renew krb5 creds
 Key: YARN-1233
 URL: https://issues.apache.org/jira/browse/YARN-1233
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer


In 2.1.0-beta-rc1 (sorry, haven't upgraded yet) the NM is not renewing krb5 
TGTs after they expire.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776571#comment-13776571
 ] 

Hadoop QA commented on YARN-1068:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604842/yarn-1068-7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2000//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2000//console

This message is automatically generated.

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776554#comment-13776554
 ] 

Hadoop QA commented on YARN-1229:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604841/YARN-1229.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2001//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2001//console

This message is automatically generated.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776546#comment-13776546
 ] 

Bikas Saha commented on YARN-1229:
--

base32 encoding is a good idea if we dont want to break compatibility. It 
basically boils down to that.

Xuan, the AuxServiceHelper is still using NM_AUX_SERVICE prefix that has _ in 
it.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776529#comment-13776529
 ] 

Xuan Gong commented on YARN-1229:
-

Run the full YARN test, all the YARN Test are passing.
Run the full MAPREDUCE test, some of tests in mapred package has time out 
issue, which I do not think it is caused by this patch.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1068) Add admin support for HA operations

2013-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Attachment: yarn-1068-7.patch

Thanks [~tucu00]. Updated patch to address the comment.

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1229:


Attachment: YARN-1229.2.patch

Add a test case

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch, YARN-1229.2.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1204) Need to add https port related property in Yarn

2013-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776515#comment-13776515
 ] 

Hudson commented on YARN-1204:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4462 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4462/])
YARN-1204. Added separate configuration properties for https for RM and NM 
without which servers enabled with https will also start on http ports. 
Contributed by Omkar Vinit Joshi.
MAPREDUCE-5523. Added separate configuration properties for https for JHS 
without which even when https is enabled, it starts on http port itself. 
Contributed by Omkar Vinit Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1525947)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/WebAppUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java


> Need to add https port related property in Yarn
> ---
>
> Key: YARN-1204
> URL: https://issues.apache.org/jira/browse/YARN-1204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, 
> YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, 
> YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch
>
>
> There is no yarn property available to configure https port for Resource 
> manager, nodemanager and history server. Currently, Yarn services uses the 
> port defined for http [defined by 
> 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 
> 'yarn.resourcemanager.webapp.address'] for running services on https protocol.
> Yarn should have list of property to assign https port for RM, NM and JHS.
> It can be like below.
> yarn.nodemanager.webapp.https.address
> yarn.resourcemanager.webapp.https.address
> mapreduce.jobhistory.webapp.https.address 

--

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy

2013-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776509#comment-13776509
 ] 

Karthik Kambatla commented on YARN-1028:


Using the configs introduced in YARN-1232, we should be able to retry alternate 
RMs by setting {{yarn.resourcemanager.ha.nodes.id}}. [~devaraj.k], I hope it is 
okay if I take this up.

> Add FailoverProxyProvider like capability to RMProxy
> 
>
> Key: YARN-1028
> URL: https://issues.apache.org/jira/browse/YARN-1028
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Devaraj K
>
> RMProxy layer currently abstracts RM discovery and implements it by looking 
> up service information from configuration. Motivated by HDFS and using 
> existing classes from Common, we can add failover proxy providers that may 
> provide RM discovery in extensible ways.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1232) Configuration support for RM HA

2013-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776508#comment-13776508
 ] 

Karthik Kambatla commented on YARN-1232:


Will post another patch that describes these configs in yarn-default.xml. Don't 
think we can have default values for these though.

> Configuration support for RM HA
> ---
>
> Key: YARN-1232
> URL: https://issues.apache.org/jira/browse/YARN-1232
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1232-1.patch
>
>
> We should augment the configuration to allow users specify two RMs and the 
> individual RPC addresses for them. This blocks 
> ConfiguredFailoverProxyProvider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1232) Configuration support for RM HA

2013-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1232:
---

Attachment: yarn-1232-1.patch

Patch that adds the configs to YarnConfiguration and hooks them up to RM 
startup and RMProxy implementation through HAUtil.

> Configuration support for RM HA
> ---
>
> Key: YARN-1232
> URL: https://issues.apache.org/jira/browse/YARN-1232
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1232-1.patch
>
>
> We should augment the configuration to allow users specify two RMs and the 
> individual RPC addresses for them. This blocks 
> ConfiguredFailoverProxyProvider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1232) Configuration support for RM HA

2013-09-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1232:
--

 Summary: Configuration support for RM HA
 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


We should augment the configuration to allow users specify two RMs and the 
individual RPC addresses for them. This blocks ConfiguredFailoverProxyProvider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1089) Add YARN compute units alongside virtual cores

2013-09-24 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1089:


Target Version/s: 2.3.0  (was: 2.1.1-beta)

> Add YARN compute units alongside virtual cores
> --
>
> Key: YARN-1089
> URL: https://issues.apache.org/jira/browse/YARN-1089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1089-1.patch, YARN-1089.patch
>
>
> Based on discussion in YARN-1024, we will add YARN compute units as a 
> resource for requesting and scheduling CPU processing power.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-09-24 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776502#comment-13776502
 ] 

Alejandro Abdelnur commented on YARN-1068:
--

One nit, in the RMHAProtocolService, the {{serviceStop()}} should be symmetric 
with the start in the sense it should do the {{if (haEnabled)}} check to stop 
the HAAdmin server (instead of doing this check in the HAAdmin service itself).


> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores

2013-09-24 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776503#comment-13776503
 ] 

Arun C Murthy commented on YARN-1089:
-

I don't think we should put this in branch-2.1 or target this for hadoop-2.2.

This is a major new feature which can be implemented in a compatible manner - 
let's target this for 2.3.0.

> Add YARN compute units alongside virtual cores
> --
>
> Key: YARN-1089
> URL: https://issues.apache.org/jira/browse/YARN-1089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1089-1.patch, YARN-1089.patch
>
>
> Based on discussion in YARN-1024, we will add YARN compute units as a 
> resource for requesting and scheduling CPU processing power.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1229:


Attachment: YARN-1229.1.patch

Attached patch changes the mapreduce.shuffle to MapreduceShuffle. Also enforce 
the check(service name should contain only a-zA-Z0-9) at AuxSerivce


> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
> Attachments: YARN-1229.1.patch
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1204) Need to add https port related property in Yarn

2013-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776485#comment-13776485
 ] 

Vinod Kumar Vavilapalli commented on YARN-1204:
---

The latest patch looks good to me. +1. Checking this in.

> Need to add https port related property in Yarn
> ---
>
> Key: YARN-1204
> URL: https://issues.apache.org/jira/browse/YARN-1204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, 
> YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, 
> YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch
>
>
> There is no yarn property available to configure https port for Resource 
> manager, nodemanager and history server. Currently, Yarn services uses the 
> port defined for http [defined by 
> 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 
> 'yarn.resourcemanager.webapp.address'] for running services on https protocol.
> Yarn should have list of property to assign https port for RM, NM and JHS.
> It can be like below.
> yarn.nodemanager.webapp.https.address
> yarn.resourcemanager.webapp.https.address
> mapreduce.jobhistory.webapp.https.address 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776484#comment-13776484
 ] 

Karthik Kambatla commented on YARN-1068:


[~bikassaha], when you get a chance, can you review the latest patch? 

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, 
> yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776441#comment-13776441
 ] 

Hadoop QA commented on YARN-1021:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604818/YARN-1021.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1999//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1999//console

This message is automatically generated.

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atl

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Attachment: (was: YARN-1021.pdf)

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.pdf

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.patch

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776367#comment-13776367
 ] 

Alejandro Abdelnur commented on YARN-1021:
--

[~ywskycn], we shouldn't use /tmp as that does not get clean up by the build, 
instead we should use a temp subdir under target/, easily done by:

{code}
File dir = new File("target", UUID.randomUUID());
dir.mkdirs();
{code}

And the documentation, in the appendix should have a complete/simple example of 
an sls JSON input file as a reference.

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776364#comment-13776364
 ] 

Wei Yan commented on YARN-1021:
---

Update a new patch according to [~tucu00]'s latest comments.
And also let simulator support two types of inputs:
(1) The rumen traces, thus users can directly deploy their rumen traces to the 
simulator.
(2) The simulator itself traces (sls), which is much simpler and users can 
easily generate various workloads. The simulator also has a tool to help users 
convert rumen traces to sls traces.

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776357#comment-13776357
 ] 

Hadoop QA commented on YARN-1021:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604801/YARN-1021.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1998//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1998//console

This message is automatically generated.

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/softwa

[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776350#comment-13776350
 ] 

Chris Nauroth commented on YARN-1229:
-

BTW, if we use {{[a-zA-Z_]+[a-zA-Z0-9_]*}}, then that will be compatible with 
Windows too.  It looks like Windows actually allows many more characters than 
that, but I think it makes sense to stick to a minimal set that we expect to 
work cross-platform.

> Shell$ExitCodeException could happen if AM fails to start
> -
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.1.1-beta
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_01 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.patch

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776302#comment-13776302
 ] 

Hadoop QA commented on YARN-1231:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604791/YARN-1231.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1997//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1997//console

This message is automatically generated.

> Fix test cases that will hit max- am-used-resources-percent limit after 
> YARN-276
> 
>
> Key: YARN-1231
> URL: https://issues.apache.org/jira/browse/YARN-1231
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.1.1-beta
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>  Labels: test
> Attachments: YARN-1231.patch
>
>
> Use a separate jira to fix YARN's test cases that will fail by hitting max- 
> am-used-resources-percent limit after YARN-276.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2013-09-24 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated YARN-1231:


Attachment: YARN-1231.patch

A patch fixing test cases in hadoop-yarn-server-resourcemanager project.

> Fix test cases that will hit max- am-used-resources-percent limit after 
> YARN-276
> 
>
> Key: YARN-1231
> URL: https://issues.apache.org/jira/browse/YARN-1231
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.1.1-beta
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>  Labels: test
> Attachments: YARN-1231.patch
>
>
> Use a separate jira to fix YARN's test cases that will fail by hitting max- 
> am-used-resources-percent limit after YARN-276.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776173#comment-13776173
 ] 

Hadoop QA commented on YARN-1021:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604767/YARN-1021.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist:

  org.apache.hadoop.yarn.sls.TestSLSRunner

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1996//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1996//console

This message is automatically generated.

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information 

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.patch

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776142#comment-13776142
 ] 

Hadoop QA commented on YARN-1021:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604747/YARN-1021.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1145 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1995//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1995//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1995//console

This message is automatically generated.

> Yarn Scheduler Load Simulator
> -
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrec

  1   2   >