[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-641:
-

Attachment: YARN-641.2.patch

Update the patch to make using NMClient configurable.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch, YARN-641.2.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-603) Create a testcase to validate Environment.MALLOC_ARENA_MAX

2013-06-12 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima reassigned YARN-603:


Assignee: Kenji Kikushima

 Create a testcase to validate Environment.MALLOC_ARENA_MAX
 --

 Key: YARN-603
 URL: https://issues.apache.org/jira/browse/YARN-603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Kenji Kikushima
 Attachments: YARN-603.patch


 The current test to validate Environment.MALLOC_ARENA_MAX isn't sufficient. 
 We need validate YarnConfiguration.NM_ADMIN_USER_ENV, too. 
 And YARN-561 removed testing of Environment.MALLOC_ARENA_MAX, we need to 
 create new test case to test it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-603) Create a testcase to validate Environment.MALLOC_ARENA_MAX

2013-06-12 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-603:
-

Attachment: YARN-603.patch

Added validation for YarnConfiguration.NM_ADMIN_USER_ENV. Is this insufficient?

 Create a testcase to validate Environment.MALLOC_ARENA_MAX
 --

 Key: YARN-603
 URL: https://issues.apache.org/jira/browse/YARN-603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Kenji Kikushima
 Attachments: YARN-603.patch


 The current test to validate Environment.MALLOC_ARENA_MAX isn't sufficient. 
 We need validate YarnConfiguration.NM_ADMIN_USER_ENV, too. 
 And YARN-561 removed testing of Environment.MALLOC_ARENA_MAX, we need to 
 create new test case to test it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681000#comment-13681000
 ] 

Hadoop QA commented on YARN-641:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587383/YARN-641.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1199//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1199//console

This message is automatically generated.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch, YARN-641.2.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-641:
-

Attachment: YARN-641.3.patch

Fix the test failure.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch, YARN-641.2.patch, YARN-641.3.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681103#comment-13681103
 ] 

Hudson commented on YARN-795:
-

Integrated in Hadoop-Yarn-trunk #238 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/238/])
YARN-795. Fair scheduler queue metrics should subtract allocated vCores 
from available vCores. (ywskycn via tucu) (Revision 1492021)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492021
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Fair scheduler queue metrics should subtract allocated vCores from available 
 vCores
 ---

 Key: YARN-795
 URL: https://issues.apache.org/jira/browse/YARN-795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: YARN-795-2.patch, YARN-795.patch


 The queue metrics of fair scheduler doesn't subtract allocated vCores from 
 available vCores, causing the available vCores returned is incorrect.
 This is happening because {code}QueueMetrics.getAllocateResources(){code} 
 doesn't return the allocated vCores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681104#comment-13681104
 ] 

Hudson commented on YARN-737:
-

Integrated in Hadoop-Yarn-trunk #238 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/238/])
YARN-737. Throw some specific exceptions directly instead of wrapping them 
in YarnException. Contributed by Jian He. (Revision 1491896)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491896
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/InvalidContainerException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/NMNotYetReadyException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Some Exceptions no longer need to be wrapped by YarnException and can be 
 directly thrown out after YARN-142 
 

 Key: YARN-737
 URL: https://issues.apache.org/jira/browse/YARN-737
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.1.0-beta

 Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, 
 YARN-737.4.patch, YARN-737.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681189#comment-13681189
 ] 

Hudson commented on YARN-731:
-

Integrated in Hadoop-Hdfs-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1428/])
YARN-731. RPCUtil.unwrapAndThrowException should unwrap remote 
RuntimeExceptions. Contributed by Zhijie Shen. (Revision 1492000)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492000
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/ipc/TestRPCUtil.java


 RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
 --

 Key: YARN-731
 URL: https://issues.apache.org/jira/browse/YARN-731
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Zhijie Shen
 Fix For: 2.1.0-beta

 Attachments: YARN-731.1.patch, YARN-731.2.patch


 Will be required for YARN-662. Also, remote NPEs show up incorrectly for some 
 unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681190#comment-13681190
 ] 

Hudson commented on YARN-795:
-

Integrated in Hadoop-Hdfs-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1428/])
YARN-795. Fair scheduler queue metrics should subtract allocated vCores 
from available vCores. (ywskycn via tucu) (Revision 1492021)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492021
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Fair scheduler queue metrics should subtract allocated vCores from available 
 vCores
 ---

 Key: YARN-795
 URL: https://issues.apache.org/jira/browse/YARN-795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: YARN-795-2.patch, YARN-795.patch


 The queue metrics of fair scheduler doesn't subtract allocated vCores from 
 available vCores, causing the available vCores returned is incorrect.
 This is happening because {code}QueueMetrics.getAllocateResources(){code} 
 doesn't return the allocated vCores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681191#comment-13681191
 ] 

Hudson commented on YARN-737:
-

Integrated in Hadoop-Hdfs-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1428/])
YARN-737. Throw some specific exceptions directly instead of wrapping them 
in YarnException. Contributed by Jian He. (Revision 1491896)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491896
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/InvalidContainerException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/NMNotYetReadyException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Some Exceptions no longer need to be wrapped by YarnException and can be 
 directly thrown out after YARN-142 
 

 Key: YARN-737
 URL: https://issues.apache.org/jira/browse/YARN-737
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.1.0-beta

 Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, 
 YARN-737.4.patch, YARN-737.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-798) compiling program in hadoop 2

2013-06-12 Thread JOB M THOMAS (JIRA)
JOB M THOMAS created YARN-798:
-

 Summary: compiling program in hadoop 2
 Key: YARN-798
 URL: https://issues.apache.org/jira/browse/YARN-798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: JOB M THOMAS




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-798) compiling program in hadoop 2

2013-06-12 Thread JOB M THOMAS (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JOB M THOMAS updated YARN-798:
--

Description: 
please help me to compile a normal wordcount.java program for hadoop 2


I am using hadoop-2.0.5-alpha.
These are the contents inside the package(  bin  etc  include  lib  libexec  
logs  sbin  share) . No conf directory. All configuration files are in 
/etc/hadoop/ directory .

We completed  a 3node cluster setup.

We have to specify lots of jarfile path during compiling wordcount program, but 
here I am not found proper jar files as in hadoop 1.x releases..

please help me to compile.







 compiling program in hadoop 2
 -

 Key: YARN-798
 URL: https://issues.apache.org/jira/browse/YARN-798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: JOB M THOMAS

 please help me to compile a normal wordcount.java program for hadoop 2
 I am using hadoop-2.0.5-alpha.
 These are the contents inside the package(  bin  etc  include  lib  libexec  
 logs  sbin  share) . No conf directory. All configuration files are in 
 /etc/hadoop/ directory .
 We completed  a 3node cluster setup.
 We have to specify lots of jarfile path during compiling wordcount program, 
 but here I am not found proper jar files as in hadoop 1.x releases..
 please help me to compile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681243#comment-13681243
 ] 

Hudson commented on YARN-731:
-

Integrated in Hadoop-Mapreduce-trunk #1455 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/])
YARN-731. RPCUtil.unwrapAndThrowException should unwrap remote 
RuntimeExceptions. Contributed by Zhijie Shen. (Revision 1492000)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492000
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/ipc/TestRPCUtil.java


 RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
 --

 Key: YARN-731
 URL: https://issues.apache.org/jira/browse/YARN-731
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Zhijie Shen
 Fix For: 2.1.0-beta

 Attachments: YARN-731.1.patch, YARN-731.2.patch


 Will be required for YARN-662. Also, remote NPEs show up incorrectly for some 
 unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-767) Initialize Application status metrics when QueueMetrics is initialized

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681246#comment-13681246
 ] 

Hudson commented on YARN-767:
-

Integrated in Hadoop-Mapreduce-trunk #1455 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/])
YARN-767. Initialize application metrics at RM bootup. Contributed by Jian 
He. (Revision 1491989)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491989
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Initialize Application status metrics  when QueueMetrics is initialized
 ---

 Key: YARN-767
 URL: https://issues.apache.org/jira/browse/YARN-767
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.1.0-beta

 Attachments: YARN-767.1.patch, YARN-767.2.patch, YARN-767.3.patch, 
 YARN-767.4.patch, YARN-767.5.patch


 Applications: ResourceManager.QueueMetrics.AppsSubmitted, 
 ResourceManager.QueueMetrics.AppsRunning, 
 ResourceManager.QueueMetrics.AppsPending, 
 ResourceManager.QueueMetrics.AppsCompleted, 
 ResourceManager.QueueMetrics.AppsKilled, 
 ResourceManager.QueueMetrics.AppsFailed
 For now these metrics are created only when they are needed, we want to make 
 them be seen when QueueMetrics is initialized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681245#comment-13681245
 ] 

Hudson commented on YARN-737:
-

Integrated in Hadoop-Mapreduce-trunk #1455 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/])
YARN-737. Throw some specific exceptions directly instead of wrapping them 
in YarnException. Contributed by Jian He. (Revision 1491896)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491896
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/InvalidContainerException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/NMNotYetReadyException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Some Exceptions no longer need to be wrapped by YarnException and can be 
 directly thrown out after YARN-142 
 

 Key: YARN-737
 URL: https://issues.apache.org/jira/browse/YARN-737
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.1.0-beta

 Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, 
 YARN-737.4.patch, YARN-737.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681244#comment-13681244
 ] 

Hudson commented on YARN-795:
-

Integrated in Hadoop-Mapreduce-trunk #1455 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/])
YARN-795. Fair scheduler queue metrics should subtract allocated vCores 
from available vCores. (ywskycn via tucu) (Revision 1492021)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492021
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Fair scheduler queue metrics should subtract allocated vCores from available 
 vCores
 ---

 Key: YARN-795
 URL: https://issues.apache.org/jira/browse/YARN-795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: YARN-795-2.patch, YARN-795.patch


 The queue metrics of fair scheduler doesn't subtract allocated vCores from 
 available vCores, causing the available vCores returned is incorrect.
 This is happening because {code}QueueMetrics.getAllocateResources(){code} 
 doesn't return the allocated vCores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-798) compiling program in hadoop 2

2013-06-12 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved YARN-798.
---

Resolution: Not A Problem

Please email the u...@hadoop.apache.org list. JIRA is for reporting problems in 
the software. 

 compiling program in hadoop 2
 -

 Key: YARN-798
 URL: https://issues.apache.org/jira/browse/YARN-798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: JOB M THOMAS

 please help me to compile a normal wordcount.java program for hadoop 2
 I am using hadoop-2.0.5-alpha.
 These are the contents inside the package(  bin  etc  include  lib  libexec  
 logs  sbin  share) . No conf directory. All configuration files are in 
 /etc/hadoop/ directory .
 We completed  a 3node cluster setup.
 We have to specify lots of jarfile path during compiling wordcount program, 
 but here I am not found proper jar files as in hadoop 1.x releases..
 please help me to compile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-12 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681374#comment-13681374
 ] 

Jonathan Eagles commented on YARN-427:
--

+1. Thanks, Aleksey. Looks really good now.

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered

2013-06-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681385#comment-13681385
 ] 

Bikas Saha commented on YARN-369:
-

Typo
{code}
+  // increment the response id to dennote that application is master is
+  // register for the respective attempid
{code}

We are getting rid of RPCUtil.throwException pattern and throwing specific 
exceptions that derive from YARNException. How about creating a new 
InvalidApplicationMasterRequest?
{code}
+  RMAuditLogger.logFailure(
+this.rmContext.getRMApps().get(appAttemptId.getApplicationId())
+  .getUser(), AuditConstants.REGISTER_AM, ,
+ApplicationMasterService, message, appAttemptId.getApplicationId(),
+appAttemptId);
+  throw RPCUtil.getRemoteException(message);
{code}

What is registerApplicationMaster is called 2 times, is that legal? We should 
probably check the responseid to be -1 and set to 0 in 
registerApplicationMaster. This will reject duplicate calls to register. Test 
for that too.

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, trunk-win
Reporter: Hitesh Shah
Assignee: Mayank Bansal
 Attachments: YARN-369.patch, YARN-369-trunk-1.patch


 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681388#comment-13681388
 ] 

Hudson commented on YARN-427:
-

Integrated in Hadoop-trunk-Commit #3904 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3904/])
YARN-427. Coverage fix for org.apache.hadoop.yarn.server.api.* (Aleksey 
Gorshkov via jeagles) (Revision 1492282)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492282
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 2.1.0-beta, 0.23.9

 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-379) yarn [node,application] command print logger info messages

2013-06-12 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-379:
--

Attachment: YARN-379.patch

This patch sets the log level to WARN for console.

I tested by building cleanly after the patch and bringing up the daemons. The 
log and out files for the daemons are still at info level and the yarn commands 
are at WARN level.

Could someone please review and check it in?

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Abhishek Kapoor
  Labels: usability
 Attachments: YARN-379.patch, YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681410#comment-13681410
 ] 

Hadoop QA commented on YARN-379:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587458/YARN-379.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1202//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1202//console

This message is automatically generated.

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Abhishek Kapoor
  Labels: usability
 Attachments: YARN-379.patch, YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-600:


Attachment: YARN-600.patch

 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681437#comment-13681437
 ] 

Sandy Ryza commented on YARN-600:
-

Submitted a simple patch.  Haven't had a chance to verify it manually yet.

 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-799:
-

Description: 
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

bq.  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);
if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}
if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
  }

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

bq. $ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

bq. $ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

  was:
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

bq.  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();

StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}

return sb.toString();
  }

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

bq. $ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

bq. $ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file 

[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-799:
-

Description: 
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

bq.  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();

StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}

return sb.toString();
  }

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

bq. $ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

bq. $ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

  was:
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();

StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}

return sb.toString();
  }

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

bq. $ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

bq. $ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file 

[jira] [Created] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)
Chris Riccomini created YARN-799:


 Summary: CgroupsLCEResourcesHandler tries to write to cgroup.procs
 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chris Riccomini


The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();

StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}

return sb.toString();
  }

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

bq. $ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

bq. $ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-799:
-

  Component/s: nodemanager
Affects Version/s: 2.0.5-alpha
   2.0.4-alpha

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 }
 return sb.toString();
   }
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 bq. $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 bq. $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681460#comment-13681460
 ] 

Hadoop QA commented on YARN-600:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587466/YARN-600.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1203//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1203//console

This message is automatically generated.

 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-799:
-

Description: 
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{quote}
  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
  }
{quote}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

  was:
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

bq.  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);
if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}
if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
  }

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

bq. $ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

bq. $ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site 

[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-799:
-

Description: 
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{code}
  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
  }
{code}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

  was:
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{quote}
  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
  }
{quote}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to 

[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-12 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681465#comment-13681465
 ] 

Jonathan Eagles commented on YARN-427:
--

Good catch, Sid. The intention was just for 2.3. I have corrected fix versions 
to reflect that.

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 0.23.9, 2.3.0

 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681477#comment-13681477
 ] 

Timothy St. Clair commented on YARN-799:


+1 to append to tasks, check 
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-Moving_a_Process_to_a_Control_Group.html
 for ref. 

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 }
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681478#comment-13681478
 ] 

Sandy Ryza commented on YARN-799:
-

Is there a reason that it would ever be beneficial to write to /cgroup.procs 
over /tasks?

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 }
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-600:


Target Version/s: 2.1.0-beta

 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681481#comment-13681481
 ] 

Chris Riccomini commented on YARN-799:
--

[~sandyr] Not sure. I'm afraid my cgroup experience is limited to 24h :P


 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 }
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681485#comment-13681485
 ] 

Sandy Ryza commented on YARN-799:
-

Just found this comment in the original YARN-3 discussion:

bq. This is a small edit to the previous patch. It now writes the process ID to 
cgroup.procs instead of tasks so other kernel threads started by the same 
process stay in the cgroup.

So it seems like there is some reasoning behind it, and having other threads 
started by the same process stay in the cgroup is important.  I'll have to 
learn more about cgroups to have an opinion on the right course of action.

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 }
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681486#comment-13681486
 ] 

Chris Riccomini commented on YARN-600:
--

Hey Sandy,

I just applied your patch to my local YARN, and can verify that it appears to 
be working.

{noformat}
$ cat container_1371061837111_0001_01_02/cpu.shares 
1024
$ cat container_1371061837111_0002_01_02/cpu.shares 
32768
{noformat}

I have 8 of the second type of container (32768 cpu shares) on an 8 core 
machine. When running 8 * 32768 and 1 * 1024, I get a top that looks like this:

{noformat}
 1404 criccomi  20   0 1022m 108m  13m S 98.1  0.2   2:33.03 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0005/container_1371061837111_0005_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
 3192 criccomi  20   0 1022m 109m  13m S 98.1  0.2   2:25.93 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0009/container_1371061837111_0009_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
  428 criccomi  20   0 1022m 109m  13m S 97.7  0.2   2:36.41 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0004/container_1371061837111_0004_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
 3022 criccomi  20   0 1022m 110m  13m S 97.2  0.2   2:29.74 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0007/container_1371061837111_0007_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
32443 criccomi  20   0 1022m 109m  13m S 95.1  0.2   2:40.17 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0003/container_1371061837111_0003_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
 2850 criccomi  20   0 1022m 107m  13m S 93.6  0.2   2:31.09 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0006/container_1371061837111_0006_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
 3112 criccomi  20   0 1022m 108m  13m S 93.2  0.2   2:25.54 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0008/container_1371061837111_0008_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
31038 criccomi  20   0 1022m 109m  13m S 84.5  0.2   3:07.39 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0002/container_1371061837111_0002_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
29451 criccomi  20   0 1925m 249m  13m S 16.3  0.4   0:33.29 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Dproc_nodemanager -Xmx1000m -server 
-Dhadoop.log.dir=/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs 
-Dyarn.log.dir=/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs 
-Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.lo
30447 criccomi  20   0 1022m 109m  13m S  3.7  0.2   1:28.42 
/export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps 
-Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0001/container_1371061837111_0001_01_02/gc.log
 -Dlog4j.configuration=file:/tmp/hadoop-cri
{noformat}

The column that starts with 98 is the CPU column. As you can see, 
container_1371061837111_0001_01_02 is only taking 3% CPU, while the other 
processes are taking 100%. They're all doing the same thing to burn up CPU, so 
it appears the CGroups are throttling my 1024 container as expected.

Cheers,
Chris

 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number 

[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-12 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681489#comment-13681489
 ] 

Chris Riccomini commented on YARN-799:
--

Fair enough. One thing that I notice is, when I patch to write to /tasks, the 
tasks file has 1 PID in it, while the cgroup.procs file has only 1 (as 
expected). This suggests to me that the tasks file contains all child PIDs, so 
I'm a bit confused about the comment in YARN-3. Nevertheless, it'd be worth 
verifying.

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 }
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch

2013-06-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681490#comment-13681490
 ] 

Chris Nauroth commented on YARN-700:


I gave +1 for this a while ago. I'm planning on committing it shortly.

 TestInfoBlock fails on Windows because of line ending missmatch
 ---

 Key: YARN-700
 URL: https://issues.apache.org/jira/browse/YARN-700
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: YARN-700.patch


 Exception:
 {noformat}
 Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec  
 FAILURE!
 testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  
 Time elapsed: 873 sec   FAILURE!
 java.lang.AssertionError: 
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681492#comment-13681492
 ] 

Alejandro Abdelnur commented on YARN-600:
-

Chris, thanks for verifying the patch works in cgroup CPU controller 
environment.

+1

 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681514#comment-13681514
 ] 

Hudson commented on YARN-600:
-

Integrated in Hadoop-trunk-Commit #3907 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3907/])
YARN-600. Hook up cgroups CPU settings to the number of virtual cores 
allocated. (sandyr via tucu) (Revision 1492365)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492365
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java


 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy

2013-06-12 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681536#comment-13681536
 ] 

Alejandro Abdelnur commented on YARN-648:
-

+1

 FS: Add documentation for pluggable policy
 --

 Key: YARN-648
 URL: https://issues.apache.org/jira/browse/YARN-648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: documentaion
 Attachments: yarn-648-1.patch, yarn-648-2.patch


 YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add 
 documentation on how to use this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy

2013-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681549#comment-13681549
 ] 

Hudson commented on YARN-648:
-

Integrated in Hadoop-trunk-Commit #3909 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3909/])
YARN-648. FS: Add documentation for pluggable policy. (kkambatl via tucu) 
(Revision 1492388)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492388
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 FS: Add documentation for pluggable policy
 --

 Key: YARN-648
 URL: https://issues.apache.org/jira/browse/YARN-648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: documentaion
 Fix For: 2.1.0-beta

 Attachments: yarn-648-1.patch, yarn-648-2.patch


 YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add 
 documentation on how to use this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch

2013-06-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-700:
---

 Target Version/s: 3.0.0, 2.1.0-beta  (was: 3.0.0)
Affects Version/s: 2.1.0-beta
 Hadoop Flags: Reviewed

 TestInfoBlock fails on Windows because of line ending missmatch
 ---

 Key: YARN-700
 URL: https://issues.apache.org/jira/browse/YARN-700
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: YARN-700.patch


 Exception:
 {noformat}
 Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec  
 FAILURE!
 testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  
 Time elapsed: 873 sec   FAILURE!
 java.lang.AssertionError: 
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes

2013-06-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-752:


Attachment: YARN-752-6.patch

 In AMRMClient, automatically add corresponding rack requests for requested 
 nodes
 

 Key: YARN-752
 URL: https://issues.apache.org/jira/browse/YARN-752
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, applications
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, 
 YARN-752.3.patch, YARN-752.4.patch, YARN-752-5.patch, YARN-752-6.patch, 
 YARN-752.patch


 A ContainerRequest that includes node-level requests must also include 
 matching rack-level requests for the racks that those nodes are on.  When a 
 node is present without its rack, it makes sense for the client to 
 automatically add the node's rack.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681581#comment-13681581
 ] 

Hadoop QA commented on YARN-752:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587497/YARN-752-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1204//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1204//console

This message is automatically generated.

 In AMRMClient, automatically add corresponding rack requests for requested 
 nodes
 

 Key: YARN-752
 URL: https://issues.apache.org/jira/browse/YARN-752
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, applications
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, 
 YARN-752.3.patch, YARN-752.4.patch, YARN-752-5.patch, YARN-752-6.patch, 
 YARN-752.patch


 A ContainerRequest that includes node-level requests must also include 
 matching rack-level requests for the racks that those nodes are on.  When a 
 node is present without its rack, it makes sense for the client to 
 automatically add the node's rack.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-797) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND.

2013-06-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-797:
---

Summary: DecodeIdentifier is broken. It was using KIND field for reflection 
and now we don't have class named as KIND.  (was: Remove KIND field from 
ContainerTokenIdentifier as it is not useful.)

 DecodeIdentifier is broken. It was using KIND field for reflection and now we 
 don't have class named as KIND.
 -

 Key: YARN-797
 URL: https://issues.apache.org/jira/browse/YARN-797
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi

 As we already have removed ContainerToken, ClientToken etc. classes there is 
 no point in keeping KIND field. This was used while decodingIdentifier. 
 (Reflection based on KIND). probably either we should remove or update this 
 code as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-797) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND.

2013-06-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-797:
---

Description: 
We need to fix the reflection code in Token.decodeIdentifier


  was:
We need to fix the reflection code in Token.decodeIdentifier.



 DecodeIdentifier is broken. It was using KIND field for reflection and now we 
 don't have class named as KIND.
 -

 Key: YARN-797
 URL: https://issues.apache.org/jira/browse/YARN-797
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi

 We need to fix the reflection code in Token.decodeIdentifier

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-797) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND.

2013-06-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-797:
---

Description: 
We need to fix the reflection code in Token.decodeIdentifier.


  was:
As we already have removed ContainerToken, ClientToken etc. classes there is no 
point in keeping KIND field. This was used while decodingIdentifier. 
(Reflection based on KIND). probably either we should remove or update this 
code as well.



 DecodeIdentifier is broken. It was using KIND field for reflection and now we 
 don't have class named as KIND.
 -

 Key: YARN-797
 URL: https://issues.apache.org/jira/browse/YARN-797
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi

 We need to fix the reflection code in Token.decodeIdentifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

2013-06-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681656#comment-13681656
 ] 

Xuan Gong commented on YARN-513:


+1 Looks good

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, 
 YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, 
 YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record

2013-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-792:
-

Attachment: YARN-792.1.patch

rebased on latest trunk

 Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
 

 Key: YARN-792
 URL: https://issues.apache.org/jira/browse/YARN-792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-792.1.patch, YARN-792.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681705#comment-13681705
 ] 

Hadoop QA commented on YARN-792:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587516/YARN-792.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1205//console

This message is automatically generated.

 Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
 

 Key: YARN-792
 URL: https://issues.apache.org/jira/browse/YARN-792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-792.1.patch, YARN-792.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record

2013-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-792:
-

Attachment: YARN-792.2.patch

more import fix

 Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
 

 Key: YARN-792
 URL: https://issues.apache.org/jira/browse/YARN-792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-792.1.patch, YARN-792.2.patch, YARN-792.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500

2013-06-12 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681723#comment-13681723
 ] 

Arpit Gupta commented on YARN-800:
--

Here is the stack trace

{code}
HTTP ERROR 500

Problem accessing /proxy/application_1370886527995_0658/. Reason:

Connection refused

Caused by:

java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.init(Socket.java:375)
at java.net.Socket.init(Socket.java:249)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at 
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:185)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:334)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1077)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at 

[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500

2013-06-12 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681727#comment-13681727
 ] 

Arpit Gupta commented on YARN-800:
--

Looks like we have to set the property yarn.resourcemanager.webapp.address to 
RMAddress:8088 which should not be the case. We should be defaulting the 
appropriate value in the system.

 Clicking on an AM link for a running app leads to a HTTP 500
 

 Key: YARN-800
 URL: https://issues.apache.org/jira/browse/YARN-800
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Priority: Critical

 Clicking the AM link tries to open up a page with url like
 http://hostname:8088/proxy/application_1370886527995_0645/
 and this leads to an HTTP 500

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681735#comment-13681735
 ] 

Hadoop QA commented on YARN-792:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587518/YARN-792.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1206//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1206//console

This message is automatically generated.

 Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
 

 Key: YARN-792
 URL: https://issues.apache.org/jira/browse/YARN-792
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-792.1.patch, YARN-792.2.patch, YARN-792.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-06-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681786#comment-13681786
 ] 

Sandy Ryza commented on YARN-366:
-

Rebased onto trunk

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-06-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-366:


Attachment: YARN-366-5.patch

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-12 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-789:


Summary: Enable zero capabilities resource requests in fair scheduler  
(was: Add flag to scheduler to allow zero capabilities in resources)

 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-788) Rename scheduler resource minimum to increment

2013-06-12 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-788.
-

Resolution: Won't Fix

 Rename scheduler resource minimum to increment
 --

 Key: YARN-788
 URL: https://issues.apache.org/jira/browse/YARN-788
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-788.patch


 Per discussions in YARN-689 the current name minimum is wrong, we should 
 rename it to increment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-06-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681814#comment-13681814
 ] 

Vinod Kumar Vavilapalli commented on YARN-530:
--

The latest patch looks good. Will try fixing the warnings.

We should stop adding more to these tickets. Any more issues, we should try 
after committing the set of patches.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-019.patch, YARN-117changes.pdf, 
 YARN-530-005.patch, YARN-530-008.patch, YARN-530-009.patch, 
 YARN-530-010.patch, YARN-530-011.patch, YARN-530-012.patch, 
 YARN-530-013.patch, YARN-530-014.patch, YARN-530-015.patch, 
 YARN-530-016.patch, YARN-530-017.patch, YARN-530-018.patch, 
 YARN-530-019.patch, YARN-530-020.patch, YARN-530-021.patch, 
 YARN-530-022.patch, YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, 
 YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500

2013-06-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681838#comment-13681838
 ] 

Zhijie Shen commented on YARN-800:
--

Did a quick local test, and found the link was not broken. It seems that the 
default value has already been in yarn-default.xml
{code}
  property
descriptionThe hostname of the RM./description
nameyarn.resourcemanager.hostname/name
value0.0.0.0/value
  /property 
{code}

{code}
  property
descriptionThe address of the RM web application./description
nameyarn.resourcemanager.webapp.address/name
value${yarn.resourcemanager.hostname}:8088/value
  /property
{code}

and YarnConfiguration

{code}
  public static final String RM_WEBAPP_ADDRESS = 
RM_PREFIX + webapp.address;

  public static final int DEFAULT_RM_WEBAPP_PORT = 8088;
  public static final String DEFAULT_RM_WEBAPP_ADDRESS = 0.0.0.0: +
DEFAULT_RM_WEBAPP_PORT;
{code}

Looked into the code, it seems to be related to yarn.web-proxy.address. In 
WebAppProxyServlet,

{code}
  resp.setStatus(client.executeMethod(config, method));
{code}

tries to connect the proxy host to show the application webpage. If 
yarn.web-proxy.address is not set, RM will become the proxy, and its address 
will be $\{yarn.resourcemanager.hostname\}:8088 as well.

Maybe it is good to check the configuration of yarn.web-proxy.address

 Clicking on an AM link for a running app leads to a HTTP 500
 

 Key: YARN-800
 URL: https://issues.apache.org/jira/browse/YARN-800
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Priority: Critical

 Clicking the AM link tries to open up a page with url like
 http://hostname:8088/proxy/application_1370886527995_0645/
 and this leads to an HTTP 500

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-117) Enhance YARN service model

2013-06-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-117:
-

Attachment: YARN-117-023.patch

Updated patch.
 - Drops common changes. They should be tracked separately. Didn't even review 
them, are they needed for the service stuff?
 - Drops spurious java comment changes to LocalCacheDirectoryManager.java and 
TestLocalCacheDirectoryManager.java
 - Minor improvement in TestNMWebServer.java
 - And including latest patch at YARN-530.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-2.patch, YARN-117-3.patch, 
 YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 

[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call

2013-06-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-693:
---

Description: 
This is part of YARN-613.
As per the updated design, AM will receive per NM, NMToken in following 
scenarios
* AM is receiving first container on underlying NM.
* AM is receiving container on underlying NM after either NM or RM rebooted.
** After RM reboot, as RM doesn't remember (persist) the information about keys 
issued per AM per NM, it will reissue tokens in case AM gets new container on 
underlying NM. However on NM side NM will still retain older token until it 
receives new token to support long running jobs (in work preserving 
environment).
** After NM reboot, RM will delete the token information corresponding to that 
AM for all AMs.
* AM is receiving container on underlying NM after NMToken master key is rolled 
over on RM side.
In all the cases if AM receives new NMToken then it is suppose to store it for 
future NM communication until it receives a new one.

  was:
This is part of YARN-613.
As per the updated design, AM will receive per NM, NMToken in following 
scenarios
* AM is receiving first container on underlying NM.
* AM is receiving container on underlying NM after either NM or RM rebooted.
** After RM reboot, as RM doesn't remember (persist) the information about keys 
issued per AM per NM, it will reissue tokens in case AM gets new container on 
underlying NM. However on NM side NM will still retain older token until it 
receives new token to support long running jobs (in work preserving 
environment).
** After NM reboot, RM will delete the token information corresponding to all 
AMs.
* AM is receiving container on underlying NM after NMToken master key is rolled 
over on RM side.
In all the cases if AM receives new NMToken then it is suppose to store it for 
future NM communication until it receives a new one.


 Sending NMToken to AM on allocate call
 --

 Key: YARN-693
 URL: https://issues.apache.org/jira/browse/YARN-693
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 This is part of YARN-613.
 As per the updated design, AM will receive per NM, NMToken in following 
 scenarios
 * AM is receiving first container on underlying NM.
 * AM is receiving container on underlying NM after either NM or RM rebooted.
 ** After RM reboot, as RM doesn't remember (persist) the information about 
 keys issued per AM per NM, it will reissue tokens in case AM gets new 
 container on underlying NM. However on NM side NM will still retain older 
 token until it receives new token to support long running jobs (in work 
 preserving environment).
 ** After NM reboot, RM will delete the token information corresponding to 
 that AM for all AMs.
 * AM is receiving container on underlying NM after NMToken master key is 
 rolled over on RM side.
 In all the cases if AM receives new NMToken then it is suppose to store it 
 for future NM communication until it receives a new one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681847#comment-13681847
 ] 

Hadoop QA commented on YARN-530:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587542/YARN-530-023.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1207//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1207//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-019.patch, YARN-117changes.pdf, 
 YARN-530-005.patch, YARN-530-008.patch, YARN-530-009.patch, 
 YARN-530-010.patch, YARN-530-011.patch, YARN-530-012.patch, 
 YARN-530-013.patch, YARN-530-014.patch, YARN-530-015.patch, 
 YARN-530-016.patch, YARN-530-017.patch, YARN-530-018.patch, 
 YARN-530-019.patch, YARN-530-020.patch, YARN-530-021.patch, 
 YARN-530-022.patch, YARN-530-023.patch, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call

2013-06-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-693:
---

Attachment: YARN-693-20130610.patch

 Sending NMToken to AM on allocate call
 --

 Key: YARN-693
 URL: https://issues.apache.org/jira/browse/YARN-693
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-693-20130610.patch


 This is part of YARN-613.
 As per the updated design, AM will receive per NM, NMToken in following 
 scenarios
 * AM is receiving first container on underlying NM.
 * AM is receiving container on underlying NM after either NM or RM rebooted.
 ** After RM reboot, as RM doesn't remember (persist) the information about 
 keys issued per AM per NM, it will reissue tokens in case AM gets new 
 container on underlying NM. However on NM side NM will still retain older 
 token until it receives new token to support long running jobs (in work 
 preserving environment).
 ** After NM reboot, RM will delete the token information corresponding to 
 that AM for all AMs.
 * AM is receiving container on underlying NM after NMToken master key is 
 rolled over on RM side.
 In all the cases if AM receives new NMToken then it is suppose to store it 
 for future NM communication until it receives a new one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call

2013-06-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-693:
---

Description: 
This is part of YARN-613.
As per the updated design, AM will receive per NM, NMToken in following 
scenarios
* AM is receiving first container on underlying NM.
* AM is receiving container on underlying NM after either NM or RM rebooted.
** After RM reboot, as RM doesn't remember (persist) the information about keys 
issued per AM per NM, it will reissue tokens in case AM gets new container on 
underlying NM. However on NM side NM will still retain older token until it 
receives new token to support long running jobs (in work preserving 
environment).
** After NM reboot, RM will delete the token information corresponding to that 
AM for all AMs.
* AM is receiving container on underlying NM after NMToken master key is rolled 
over on RM side.
In all the cases if AM receives new NMToken then it is suppose to store it for 
future NM communication until it receives a new one.

AMRMClient should expose these NMToken to client. 

  was:
This is part of YARN-613.
As per the updated design, AM will receive per NM, NMToken in following 
scenarios
* AM is receiving first container on underlying NM.
* AM is receiving container on underlying NM after either NM or RM rebooted.
** After RM reboot, as RM doesn't remember (persist) the information about keys 
issued per AM per NM, it will reissue tokens in case AM gets new container on 
underlying NM. However on NM side NM will still retain older token until it 
receives new token to support long running jobs (in work preserving 
environment).
** After NM reboot, RM will delete the token information corresponding to that 
AM for all AMs.
* AM is receiving container on underlying NM after NMToken master key is rolled 
over on RM side.
In all the cases if AM receives new NMToken then it is suppose to store it for 
future NM communication until it receives a new one.


 Sending NMToken to AM on allocate call
 --

 Key: YARN-693
 URL: https://issues.apache.org/jira/browse/YARN-693
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-693-20130610.patch


 This is part of YARN-613.
 As per the updated design, AM will receive per NM, NMToken in following 
 scenarios
 * AM is receiving first container on underlying NM.
 * AM is receiving container on underlying NM after either NM or RM rebooted.
 ** After RM reboot, as RM doesn't remember (persist) the information about 
 keys issued per AM per NM, it will reissue tokens in case AM gets new 
 container on underlying NM. However on NM side NM will still retain older 
 token until it receives new token to support long running jobs (in work 
 preserving environment).
 ** After NM reboot, RM will delete the token information corresponding to 
 that AM for all AMs.
 * AM is receiving container on underlying NM after NMToken master key is 
 rolled over on RM side.
 In all the cases if AM receives new NMToken then it is suppose to store it 
 for future NM communication until it receives a new one.
 AMRMClient should expose these NMToken to client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681854#comment-13681854
 ] 

Hadoop QA commented on YARN-366:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587528/YARN-366-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1208//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1208//console

This message is automatically generated.

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681875#comment-13681875
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587543/YARN-117-023.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 38 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1209//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1209//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1209//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1209//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-2.patch, YARN-117-3.patch, 
 YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} 

[jira] [Commented] (YARN-801) Expose container locations and capabilities in the RM REST APIs

2013-06-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681878#comment-13681878
 ] 

Junping Du commented on YARN-801:
-

Sandy, shall we include ContainerState and running task info as well?

 Expose container locations and capabilities in the RM REST APIs
 ---

 Key: YARN-801
 URL: https://issues.apache.org/jira/browse/YARN-801
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 It would be useful to be able to query container allocation info via the RM 
 REST APIs.  We should be able to query per application, and for each 
 container we should provide (at least):
 * location
 * resource capabilty

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-06-12 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-569:
---

Attachment: YARN-569.6.patch

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, 
 preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, 
 YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.patch, 
 YARN-569.patch


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container first) 
 again until necessary or until no containers except the AM container are left,
 # (if not enough) it moves onto unreserve and preempt from the next 
 application. 
 # containers that have been asked to preempt are tracked across executions. 
 If a containers is among the one to be preempted for more than a certain 
 time, the container is moved in a the list of containers to be forcibly 
 killed. 
 Notes:
 (*) at the moment, in order to avoid double-counting of the requests, we only 
 look at the ANY part of pending resource requests, which means we might not 
 preempt on behalf of AMs that ask only for specific locations but not any. 
 (**) The ideal balance state is one in which each queue has at least its 
 guaranteed capacity, and the spare capacity is distributed among queues (that 
 wants some) as a weighted fair share. Where the weighting is based on the 
 guaranteed capacity of a queue, and the function runs to a fix point.  
 Tunables of the ProportionalCapacityPreemptionPolicy:
 # observe-only mode (i.e., log the actions it would take, but behave as 
 read-only)
 # how frequently to run the policy
 # how long to wait between preemption and kill of a container
 # which fraction of the containers I would like to obtain should I preempt 
 (has to do with the natural rate at which containers are returned)
 # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
 perfect 

[jira] [Updated] (YARN-117) Enhance YARN service model

2013-06-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-117:
-

Attachment: YARN-117-024.patch

Uber patch suppressing findBugs warnings. All the warnings are about fields 
accessed in the service* methods, which are not synchronized on the objects but 
should be fine as they are just read and not modified in any of the cases.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, YARN-117-2.patch, 
 YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, 
 YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked 

[jira] [Commented] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681907#comment-13681907
 ] 

Hadoop QA commented on YARN-789:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587533/YARN-789.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1211//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1211//console

This message is automatically generated.

 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch, YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum 

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681911#comment-13681911
 ] 

Hadoop QA commented on YARN-569:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587557/YARN-569.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1212//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1212//console

This message is automatically generated.

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, 
 preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, 
 YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.patch, 
 YARN-569.patch


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container 

[jira] [Commented] (YARN-117) Enhance YARN service model

2013-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681932#comment-13681932
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587559/YARN-117-024.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 38 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1213//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1213//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1213//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, YARN-117-2.patch, 
 YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, 
 YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class