[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient
[ https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-641: - Attachment: YARN-641.2.patch Update the patch to make using NMClient configurable. Make AMLauncher in RM Use NMClient -- Key: YARN-641 URL: https://issues.apache.org/jira/browse/YARN-641 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-641.1.patch, YARN-641.2.patch YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions with an application's AM container. AMLauncher should also replace the raw ContainerManager proxy with NMClient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-603) Create a testcase to validate Environment.MALLOC_ARENA_MAX
[ https://issues.apache.org/jira/browse/YARN-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima reassigned YARN-603: Assignee: Kenji Kikushima Create a testcase to validate Environment.MALLOC_ARENA_MAX -- Key: YARN-603 URL: https://issues.apache.org/jira/browse/YARN-603 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Kenji Kikushima Attachments: YARN-603.patch The current test to validate Environment.MALLOC_ARENA_MAX isn't sufficient. We need validate YarnConfiguration.NM_ADMIN_USER_ENV, too. And YARN-561 removed testing of Environment.MALLOC_ARENA_MAX, we need to create new test case to test it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-603) Create a testcase to validate Environment.MALLOC_ARENA_MAX
[ https://issues.apache.org/jira/browse/YARN-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-603: - Attachment: YARN-603.patch Added validation for YarnConfiguration.NM_ADMIN_USER_ENV. Is this insufficient? Create a testcase to validate Environment.MALLOC_ARENA_MAX -- Key: YARN-603 URL: https://issues.apache.org/jira/browse/YARN-603 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Kenji Kikushima Attachments: YARN-603.patch The current test to validate Environment.MALLOC_ARENA_MAX isn't sufficient. We need validate YarnConfiguration.NM_ADMIN_USER_ENV, too. And YARN-561 removed testing of Environment.MALLOC_ARENA_MAX, we need to create new test case to test it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-641) Make AMLauncher in RM Use NMClient
[ https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681000#comment-13681000 ] Hadoop QA commented on YARN-641: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587383/YARN-641.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 13 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1199//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1199//console This message is automatically generated. Make AMLauncher in RM Use NMClient -- Key: YARN-641 URL: https://issues.apache.org/jira/browse/YARN-641 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-641.1.patch, YARN-641.2.patch YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions with an application's AM container. AMLauncher should also replace the raw ContainerManager proxy with NMClient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient
[ https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-641: - Attachment: YARN-641.3.patch Fix the test failure. Make AMLauncher in RM Use NMClient -- Key: YARN-641 URL: https://issues.apache.org/jira/browse/YARN-641 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-641.1.patch, YARN-641.2.patch, YARN-641.3.patch YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions with an application's AM container. AMLauncher should also replace the raw ContainerManager proxy with NMClient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681103#comment-13681103 ] Hudson commented on YARN-795: - Integrated in Hadoop-Yarn-trunk #238 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/238/]) YARN-795. Fair scheduler queue metrics should subtract allocated vCores from available vCores. (ywskycn via tucu) (Revision 1492021) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492021 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.1.0-beta Attachments: YARN-795-2.patch, YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142
[ https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681104#comment-13681104 ] Hudson commented on YARN-737: - Integrated in Hadoop-Yarn-trunk #238 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/238/]) YARN-737. Throw some specific exceptions directly instead of wrapping them in YarnException. Contributed by Jian He. (Revision 1491896) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491896 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/InvalidContainerException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/NMNotYetReadyException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142 Key: YARN-737 URL: https://issues.apache.org/jira/browse/YARN-737 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, YARN-737.4.patch, YARN-737.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
[ https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681189#comment-13681189 ] Hudson commented on YARN-731: - Integrated in Hadoop-Hdfs-trunk #1428 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1428/]) YARN-731. RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions. Contributed by Zhijie Shen. (Revision 1492000) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492000 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/ipc/TestRPCUtil.java RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions -- Key: YARN-731 URL: https://issues.apache.org/jira/browse/YARN-731 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Zhijie Shen Fix For: 2.1.0-beta Attachments: YARN-731.1.patch, YARN-731.2.patch Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681190#comment-13681190 ] Hudson commented on YARN-795: - Integrated in Hadoop-Hdfs-trunk #1428 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1428/]) YARN-795. Fair scheduler queue metrics should subtract allocated vCores from available vCores. (ywskycn via tucu) (Revision 1492021) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492021 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.1.0-beta Attachments: YARN-795-2.patch, YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142
[ https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681191#comment-13681191 ] Hudson commented on YARN-737: - Integrated in Hadoop-Hdfs-trunk #1428 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1428/]) YARN-737. Throw some specific exceptions directly instead of wrapping them in YarnException. Contributed by Jian He. (Revision 1491896) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491896 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/InvalidContainerException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/NMNotYetReadyException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142 Key: YARN-737 URL: https://issues.apache.org/jira/browse/YARN-737 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, YARN-737.4.patch, YARN-737.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-798) compiling program in hadoop 2
JOB M THOMAS created YARN-798: - Summary: compiling program in hadoop 2 Key: YARN-798 URL: https://issues.apache.org/jira/browse/YARN-798 Project: Hadoop YARN Issue Type: Test Reporter: JOB M THOMAS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-798) compiling program in hadoop 2
[ https://issues.apache.org/jira/browse/YARN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JOB M THOMAS updated YARN-798: -- Description: please help me to compile a normal wordcount.java program for hadoop 2 I am using hadoop-2.0.5-alpha. These are the contents inside the package( bin etc include lib libexec logs sbin share) . No conf directory. All configuration files are in /etc/hadoop/ directory . We completed a 3node cluster setup. We have to specify lots of jarfile path during compiling wordcount program, but here I am not found proper jar files as in hadoop 1.x releases.. please help me to compile. compiling program in hadoop 2 - Key: YARN-798 URL: https://issues.apache.org/jira/browse/YARN-798 Project: Hadoop YARN Issue Type: Test Reporter: JOB M THOMAS please help me to compile a normal wordcount.java program for hadoop 2 I am using hadoop-2.0.5-alpha. These are the contents inside the package( bin etc include lib libexec logs sbin share) . No conf directory. All configuration files are in /etc/hadoop/ directory . We completed a 3node cluster setup. We have to specify lots of jarfile path during compiling wordcount program, but here I am not found proper jar files as in hadoop 1.x releases.. please help me to compile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
[ https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681243#comment-13681243 ] Hudson commented on YARN-731: - Integrated in Hadoop-Mapreduce-trunk #1455 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/]) YARN-731. RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions. Contributed by Zhijie Shen. (Revision 1492000) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492000 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/ipc/TestRPCUtil.java RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions -- Key: YARN-731 URL: https://issues.apache.org/jira/browse/YARN-731 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Zhijie Shen Fix For: 2.1.0-beta Attachments: YARN-731.1.patch, YARN-731.2.patch Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-767) Initialize Application status metrics when QueueMetrics is initialized
[ https://issues.apache.org/jira/browse/YARN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681246#comment-13681246 ] Hudson commented on YARN-767: - Integrated in Hadoop-Mapreduce-trunk #1455 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/]) YARN-767. Initialize application metrics at RM bootup. Contributed by Jian He. (Revision 1491989) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491989 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Initialize Application status metrics when QueueMetrics is initialized --- Key: YARN-767 URL: https://issues.apache.org/jira/browse/YARN-767 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-767.1.patch, YARN-767.2.patch, YARN-767.3.patch, YARN-767.4.patch, YARN-767.5.patch Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142
[ https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681245#comment-13681245 ] Hudson commented on YARN-737: - Integrated in Hadoop-Mapreduce-trunk #1455 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/]) YARN-737. Throw some specific exceptions directly instead of wrapping them in YarnException. Contributed by Jian He. (Revision 1491896) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1491896 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/InvalidContainerException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/NMNotYetReadyException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142 Key: YARN-737 URL: https://issues.apache.org/jira/browse/YARN-737 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, YARN-737.4.patch, YARN-737.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681244#comment-13681244 ] Hudson commented on YARN-795: - Integrated in Hadoop-Mapreduce-trunk #1455 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1455/]) YARN-795. Fair scheduler queue metrics should subtract allocated vCores from available vCores. (ywskycn via tucu) (Revision 1492021) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492021 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.1.0-beta Attachments: YARN-795-2.patch, YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-798) compiling program in hadoop 2
[ https://issues.apache.org/jira/browse/YARN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash resolved YARN-798. --- Resolution: Not A Problem Please email the u...@hadoop.apache.org list. JIRA is for reporting problems in the software. compiling program in hadoop 2 - Key: YARN-798 URL: https://issues.apache.org/jira/browse/YARN-798 Project: Hadoop YARN Issue Type: Test Reporter: JOB M THOMAS please help me to compile a normal wordcount.java program for hadoop 2 I am using hadoop-2.0.5-alpha. These are the contents inside the package( bin etc include lib libexec logs sbin share) . No conf directory. All configuration files are in /etc/hadoop/ directory . We completed a 3node cluster setup. We have to specify lots of jarfile path during compiling wordcount program, but here I am not found proper jar files as in hadoop 1.x releases.. please help me to compile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681374#comment-13681374 ] Jonathan Eagles commented on YARN-427: -- +1. Thanks, Aleksey. Looks really good now. Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered
[ https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681385#comment-13681385 ] Bikas Saha commented on YARN-369: - Typo {code} + // increment the response id to dennote that application is master is + // register for the respective attempid {code} We are getting rid of RPCUtil.throwException pattern and throwing specific exceptions that derive from YARNException. How about creating a new InvalidApplicationMasterRequest? {code} + RMAuditLogger.logFailure( +this.rmContext.getRMApps().get(appAttemptId.getApplicationId()) + .getUser(), AuditConstants.REGISTER_AM, , +ApplicationMasterService, message, appAttemptId.getApplicationId(), +appAttemptId); + throw RPCUtil.getRemoteException(message); {code} What is registerApplicationMaster is called 2 times, is that legal? We should probably check the responseid to be -1 and set to 0 in registerApplicationMaster. This will reject duplicate calls to register. Test for that too. Handle ( or throw a proper error when receiving) status updates from application masters that have not registered - Key: YARN-369 URL: https://issues.apache.org/jira/browse/YARN-369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha, trunk-win Reporter: Hitesh Shah Assignee: Mayank Bansal Attachments: YARN-369.patch, YARN-369-trunk-1.patch Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:680) ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681388#comment-13681388 ] Hudson commented on YARN-427: - Integrated in Hadoop-trunk-Commit #3904 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3904/]) YARN-427. Coverage fix for org.apache.hadoop.yarn.server.api.* (Aleksey Gorshkov via jeagles) (Revision 1492282) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492282 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.1.0-beta, 0.23.9 Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-379) yarn [node,application] command print logger info messages
[ https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-379: -- Attachment: YARN-379.patch This patch sets the log level to WARN for console. I tested by building cleanly after the patch and bringing up the daemons. The log and out files for the daemons are still at info level and the yarn commands are at WARN level. Could someone please review and check it in? yarn [node,application] command print logger info messages -- Key: YARN-379 URL: https://issues.apache.org/jira/browse/YARN-379 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Abhishek Kapoor Labels: usability Attachments: YARN-379.patch, YARN-379.patch Running the yarn node and yarn applications command results in annoying log info messages being printed: $ yarn node -list 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Total Nodes:1 Node-IdNode-State Node-Http-Address Health-Status(isNodeHealthy)Running-Containers foo:8041RUNNING foo:8042 true 0 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. $ yarn application 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Invalid Command Usage : usage: application -kill arg Kills the application. -list Lists all the Applications from RM. -status arg Prints the status of the application. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages
[ https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681410#comment-13681410 ] Hadoop QA commented on YARN-379: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587458/YARN-379.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1202//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1202//console This message is automatically generated. yarn [node,application] command print logger info messages -- Key: YARN-379 URL: https://issues.apache.org/jira/browse/YARN-379 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Abhishek Kapoor Labels: usability Attachments: YARN-379.patch, YARN-379.patch Running the yarn node and yarn applications command results in annoying log info messages being printed: $ yarn node -list 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Total Nodes:1 Node-IdNode-State Node-Http-Address Health-Status(isNodeHealthy)Running-Containers foo:8041RUNNING foo:8042 true 0 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. $ yarn application 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Invalid Command Usage : usage: application -kill arg Kills the application. -list Lists all the Applications from RM. -status arg Prints the status of the application. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-600: Attachment: YARN-600.patch Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681437#comment-13681437 ] Sandy Ryza commented on YARN-600: - Submitted a simple patch. Haven't had a chance to verify it manually yet. Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated YARN-799: - Description: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: bq. public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? was: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: bq. public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file
[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated YARN-799: - Description: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: bq. public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? was: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file
[jira] [Created] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
Chris Riccomini created YARN-799: Summary: CgroupsLCEResourcesHandler tries to write to cgroup.procs Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated YARN-799: - Component/s: nodemanager Affects Version/s: 2.0.5-alpha 2.0.4-alpha CgroupsLCEResourcesHandler tries to write to cgroup.procs - Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681460#comment-13681460 ] Hadoop QA commented on YARN-600: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587466/YARN-600.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1203//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1203//console This message is automatically generated. Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated YARN-799: - Description: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {quote} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {quote} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? was: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: bq. public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site
[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated YARN-799: - Description: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? was: The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {quote} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {quote} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681465#comment-13681465 ] Jonathan Eagles commented on YARN-427: -- Good catch, Sid. The intention was just for 2.3. I have corrected fix versions to reflect that. Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 0.23.9, 2.3.0 Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681477#comment-13681477 ] Timothy St. Clair commented on YARN-799: +1 to append to tasks, check https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-Moving_a_Process_to_a_Control_Group.html for ref. CgroupsLCEResourcesHandler tries to write to cgroup.procs - Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681478#comment-13681478 ] Sandy Ryza commented on YARN-799: - Is there a reason that it would ever be beneficial to write to /cgroup.procs over /tasks? CgroupsLCEResourcesHandler tries to write to cgroup.procs - Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-600: Target Version/s: 2.1.0-beta Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681481#comment-13681481 ] Chris Riccomini commented on YARN-799: -- [~sandyr] Not sure. I'm afraid my cgroup experience is limited to 24h :P CgroupsLCEResourcesHandler tries to write to cgroup.procs - Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681485#comment-13681485 ] Sandy Ryza commented on YARN-799: - Just found this comment in the original YARN-3 discussion: bq. This is a small edit to the previous patch. It now writes the process ID to cgroup.procs instead of tasks so other kernel threads started by the same process stay in the cgroup. So it seems like there is some reasoning behind it, and having other threads started by the same process stay in the cgroup is important. I'll have to learn more about cgroups to have an opinion on the right course of action. CgroupsLCEResourcesHandler tries to write to cgroup.procs - Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681486#comment-13681486 ] Chris Riccomini commented on YARN-600: -- Hey Sandy, I just applied your patch to my local YARN, and can verify that it appears to be working. {noformat} $ cat container_1371061837111_0001_01_02/cpu.shares 1024 $ cat container_1371061837111_0002_01_02/cpu.shares 32768 {noformat} I have 8 of the second type of container (32768 cpu shares) on an 8 core machine. When running 8 * 32768 and 1 * 1024, I get a top that looks like this: {noformat} 1404 criccomi 20 0 1022m 108m 13m S 98.1 0.2 2:33.03 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0005/container_1371061837111_0005_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 3192 criccomi 20 0 1022m 109m 13m S 98.1 0.2 2:25.93 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0009/container_1371061837111_0009_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 428 criccomi 20 0 1022m 109m 13m S 97.7 0.2 2:36.41 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0004/container_1371061837111_0004_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 3022 criccomi 20 0 1022m 110m 13m S 97.2 0.2 2:29.74 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0007/container_1371061837111_0007_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 32443 criccomi 20 0 1022m 109m 13m S 95.1 0.2 2:40.17 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0003/container_1371061837111_0003_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 2850 criccomi 20 0 1022m 107m 13m S 93.6 0.2 2:31.09 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0006/container_1371061837111_0006_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 3112 criccomi 20 0 1022m 108m 13m S 93.2 0.2 2:25.54 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0008/container_1371061837111_0008_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 31038 criccomi 20 0 1022m 109m 13m S 84.5 0.2 3:07.39 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0002/container_1371061837111_0002_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri 29451 criccomi 20 0 1925m 249m 13m S 16.3 0.4 0:33.29 /export/apps/jdk/JDK-1_6_0_27/bin/java -Dproc_nodemanager -Xmx1000m -server -Dhadoop.log.dir=/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs -Dyarn.log.dir=/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs -Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.lo 30447 criccomi 20 0 1022m 109m 13m S 3.7 0.2 1:28.42 /export/apps/jdk/JDK-1_6_0_27/bin/java -Xmx160M -XX:+PrintGCDateStamps -Xloggc:/home/criccomi/Downloads/hadoop-2.0.5-alpha/logs/userlogs/application_1371061837111_0001/container_1371061837111_0001_01_02/gc.log -Dlog4j.configuration=file:/tmp/hadoop-cri {noformat} The column that starts with 98 is the CPU column. As you can see, container_1371061837111_0001_01_02 is only taking 3% CPU, while the other processes are taking 100%. They're all doing the same thing to burn up CPU, so it appears the CGroups are throttling my 1024 container as expected. Cheers, Chris Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number
[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs
[ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681489#comment-13681489 ] Chris Riccomini commented on YARN-799: -- Fair enough. One thing that I notice is, when I patch to write to /tasks, the tasks file has 1 PID in it, while the cgroup.procs file has only 1 (as expected). This suggests to me that the tasks file contains all child PIDs, so I'm a bit confused about the comment in YARN-3. Nevertheless, it'd be worth verifying. CgroupsLCEResourcesHandler tries to write to cgroup.procs - Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha, 2.0.5-alpha Reporter: Chris Riccomini The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder(cgroups=); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs); sb.append(,); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n, This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch
[ https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681490#comment-13681490 ] Chris Nauroth commented on YARN-700: I gave +1 for this a while ago. I'm planning on committing it shortly. TestInfoBlock fails on Windows because of line ending missmatch --- Key: YARN-700 URL: https://issues.apache.org/jira/browse/YARN-700 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: YARN-700.patch Exception: {noformat} Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec FAILURE! testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock) Time elapsed: 873 sec FAILURE! java.lang.AssertionError: at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681492#comment-13681492 ] Alejandro Abdelnur commented on YARN-600: - Chris, thanks for verifying the patch works in cgroup CPU controller environment. +1 Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated
[ https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681514#comment-13681514 ] Hudson commented on YARN-600: - Integrated in Hadoop-trunk-Commit #3907 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3907/]) YARN-600. Hook up cgroups CPU settings to the number of virtual cores allocated. (sandyr via tucu) (Revision 1492365) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492365 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Hook up cgroups CPU settings to the number of virtual cores allocated - Key: YARN-600 URL: https://issues.apache.org/jira/browse/YARN-600 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-600.patch YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy
[ https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681536#comment-13681536 ] Alejandro Abdelnur commented on YARN-648: - +1 FS: Add documentation for pluggable policy -- Key: YARN-648 URL: https://issues.apache.org/jira/browse/YARN-648 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: documentaion Attachments: yarn-648-1.patch, yarn-648-2.patch YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add documentation on how to use this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy
[ https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681549#comment-13681549 ] Hudson commented on YARN-648: - Integrated in Hadoop-trunk-Commit #3909 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3909/]) YARN-648. FS: Add documentation for pluggable policy. (kkambatl via tucu) (Revision 1492388) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492388 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm FS: Add documentation for pluggable policy -- Key: YARN-648 URL: https://issues.apache.org/jira/browse/YARN-648 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: documentaion Fix For: 2.1.0-beta Attachments: yarn-648-1.patch, yarn-648-2.patch YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add documentation on how to use this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch
[ https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-700: --- Target Version/s: 3.0.0, 2.1.0-beta (was: 3.0.0) Affects Version/s: 2.1.0-beta Hadoop Flags: Reviewed TestInfoBlock fails on Windows because of line ending missmatch --- Key: YARN-700 URL: https://issues.apache.org/jira/browse/YARN-700 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: YARN-700.patch Exception: {noformat} Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec FAILURE! testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock) Time elapsed: 873 sec FAILURE! java.lang.AssertionError: at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes
[ https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-752: Attachment: YARN-752-6.patch In AMRMClient, automatically add corresponding rack requests for requested nodes Key: YARN-752 URL: https://issues.apache.org/jira/browse/YARN-752 Project: Hadoop YARN Issue Type: Improvement Components: api, applications Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, YARN-752.3.patch, YARN-752.4.patch, YARN-752-5.patch, YARN-752-6.patch, YARN-752.patch A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes
[ https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681581#comment-13681581 ] Hadoop QA commented on YARN-752: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587497/YARN-752-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1204//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1204//console This message is automatically generated. In AMRMClient, automatically add corresponding rack requests for requested nodes Key: YARN-752 URL: https://issues.apache.org/jira/browse/YARN-752 Project: Hadoop YARN Issue Type: Improvement Components: api, applications Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, YARN-752.3.patch, YARN-752.4.patch, YARN-752-5.patch, YARN-752-6.patch, YARN-752.patch A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-797) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND.
[ https://issues.apache.org/jira/browse/YARN-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-797: --- Summary: DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND. (was: Remove KIND field from ContainerTokenIdentifier as it is not useful.) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND. - Key: YARN-797 URL: https://issues.apache.org/jira/browse/YARN-797 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi As we already have removed ContainerToken, ClientToken etc. classes there is no point in keeping KIND field. This was used while decodingIdentifier. (Reflection based on KIND). probably either we should remove or update this code as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-797) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND.
[ https://issues.apache.org/jira/browse/YARN-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-797: --- Description: We need to fix the reflection code in Token.decodeIdentifier was: We need to fix the reflection code in Token.decodeIdentifier. DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND. - Key: YARN-797 URL: https://issues.apache.org/jira/browse/YARN-797 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi We need to fix the reflection code in Token.decodeIdentifier -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-797) DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND.
[ https://issues.apache.org/jira/browse/YARN-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-797: --- Description: We need to fix the reflection code in Token.decodeIdentifier. was: As we already have removed ContainerToken, ClientToken etc. classes there is no point in keeping KIND field. This was used while decodingIdentifier. (Reflection based on KIND). probably either we should remove or update this code as well. DecodeIdentifier is broken. It was using KIND field for reflection and now we don't have class named as KIND. - Key: YARN-797 URL: https://issues.apache.org/jira/browse/YARN-797 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi We need to fix the reflection code in Token.decodeIdentifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681656#comment-13681656 ] Xuan Gong commented on YARN-513: +1 Looks good Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
[ https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-792: - Attachment: YARN-792.1.patch rebased on latest trunk Move NodeHealthStatus from yarn.api.record to yarn.server.api.record Key: YARN-792 URL: https://issues.apache.org/jira/browse/YARN-792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-792.1.patch, YARN-792.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
[ https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681705#comment-13681705 ] Hadoop QA commented on YARN-792: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587516/YARN-792.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1205//console This message is automatically generated. Move NodeHealthStatus from yarn.api.record to yarn.server.api.record Key: YARN-792 URL: https://issues.apache.org/jira/browse/YARN-792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-792.1.patch, YARN-792.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
[ https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-792: - Attachment: YARN-792.2.patch more import fix Move NodeHealthStatus from yarn.api.record to yarn.server.api.record Key: YARN-792 URL: https://issues.apache.org/jira/browse/YARN-792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-792.1.patch, YARN-792.2.patch, YARN-792.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500
[ https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681723#comment-13681723 ] Arpit Gupta commented on YARN-800: -- Here is the stack trace {code} HTTP ERROR 500 Problem accessing /proxy/application_1370886527995_0658/. Reason: Connection refused Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at java.net.Socket.connect(Socket.java:478) at java.net.Socket.init(Socket.java:375) at java.net.Socket.init(Socket.java:249) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:185) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:334) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1077) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at
[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500
[ https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681727#comment-13681727 ] Arpit Gupta commented on YARN-800: -- Looks like we have to set the property yarn.resourcemanager.webapp.address to RMAddress:8088 which should not be the case. We should be defaulting the appropriate value in the system. Clicking on an AM link for a running app leads to a HTTP 500 Key: YARN-800 URL: https://issues.apache.org/jira/browse/YARN-800 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Priority: Critical Clicking the AM link tries to open up a page with url like http://hostname:8088/proxy/application_1370886527995_0645/ and this leads to an HTTP 500 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-792) Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
[ https://issues.apache.org/jira/browse/YARN-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681735#comment-13681735 ] Hadoop QA commented on YARN-792: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587518/YARN-792.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1206//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1206//console This message is automatically generated. Move NodeHealthStatus from yarn.api.record to yarn.server.api.record Key: YARN-792 URL: https://issues.apache.org/jira/browse/YARN-792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-792.1.patch, YARN-792.2.patch, YARN-792.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681786#comment-13681786 ] Sandy Ryza commented on YARN-366: - Rebased onto trunk Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-366: Attachment: YARN-366-5.patch Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-789) Enable zero capabilities resource requests in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-789: Summary: Enable zero capabilities resource requests in fair scheduler (was: Add flag to scheduler to allow zero capabilities in resources) Enable zero capabilities resource requests in fair scheduler Key: YARN-789 URL: https://issues.apache.org/jira/browse/YARN-789 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-789.patch Per discussion in YARN-689, reposting updated use case: 1. I have a set of services co-existing with a Yarn cluster. 2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing. 3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa. By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources. These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping. The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 1d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory). The current limitation is that the increment is also the minimum. If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc). If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster. Finally, on hard enforcement. * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-788) Rename scheduler resource minimum to increment
[ https://issues.apache.org/jira/browse/YARN-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved YARN-788. - Resolution: Won't Fix Rename scheduler resource minimum to increment -- Key: YARN-788 URL: https://issues.apache.org/jira/browse/YARN-788 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-788.patch Per discussions in YARN-689 the current name minimum is wrong, we should rename it to increment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681814#comment-13681814 ] Vinod Kumar Vavilapalli commented on YARN-530: -- The latest patch looks good. Will try fixing the warnings. We should stop adding more to these tickets. Any more issues, we should try after committing the set of patches. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-019.patch, YARN-117changes.pdf, YARN-530-005.patch, YARN-530-008.patch, YARN-530-009.patch, YARN-530-010.patch, YARN-530-011.patch, YARN-530-012.patch, YARN-530-013.patch, YARN-530-014.patch, YARN-530-015.patch, YARN-530-016.patch, YARN-530-017.patch, YARN-530-018.patch, YARN-530-019.patch, YARN-530-020.patch, YARN-530-021.patch, YARN-530-022.patch, YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500
[ https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681838#comment-13681838 ] Zhijie Shen commented on YARN-800: -- Did a quick local test, and found the link was not broken. It seems that the default value has already been in yarn-default.xml {code} property descriptionThe hostname of the RM./description nameyarn.resourcemanager.hostname/name value0.0.0.0/value /property {code} {code} property descriptionThe address of the RM web application./description nameyarn.resourcemanager.webapp.address/name value${yarn.resourcemanager.hostname}:8088/value /property {code} and YarnConfiguration {code} public static final String RM_WEBAPP_ADDRESS = RM_PREFIX + webapp.address; public static final int DEFAULT_RM_WEBAPP_PORT = 8088; public static final String DEFAULT_RM_WEBAPP_ADDRESS = 0.0.0.0: + DEFAULT_RM_WEBAPP_PORT; {code} Looked into the code, it seems to be related to yarn.web-proxy.address. In WebAppProxyServlet, {code} resp.setStatus(client.executeMethod(config, method)); {code} tries to connect the proxy host to show the application webpage. If yarn.web-proxy.address is not set, RM will become the proxy, and its address will be $\{yarn.resourcemanager.hostname\}:8088 as well. Maybe it is good to check the configuration of yarn.web-proxy.address Clicking on an AM link for a running app leads to a HTTP 500 Key: YARN-800 URL: https://issues.apache.org/jira/browse/YARN-800 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Priority: Critical Clicking the AM link tries to open up a page with url like http://hostname:8088/proxy/application_1370886527995_0645/ and this leads to an HTTP 500 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-117: - Attachment: YARN-117-023.patch Updated patch. - Drops common changes. They should be tracked separately. Didn't even review them, are they needed for the service stuff? - Drops spurious java comment changes to LocalCacheDirectoryManager.java and TestLocalCacheDirectoryManager.java - Minor improvement in TestNMWebServer.java - And including latest patch at YARN-530. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-007.patch, YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, YARN-117-022.patch, YARN-117-023.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as
[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call
[ https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-693: --- Description: This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. was: This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. Sending NMToken to AM on allocate call -- Key: YARN-693 URL: https://issues.apache.org/jira/browse/YARN-693 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681847#comment-13681847 ] Hadoop QA commented on YARN-530: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587542/YARN-530-023.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1207//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1207//console This message is automatically generated. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-019.patch, YARN-117changes.pdf, YARN-530-005.patch, YARN-530-008.patch, YARN-530-009.patch, YARN-530-010.patch, YARN-530-011.patch, YARN-530-012.patch, YARN-530-013.patch, YARN-530-014.patch, YARN-530-015.patch, YARN-530-016.patch, YARN-530-017.patch, YARN-530-018.patch, YARN-530-019.patch, YARN-530-020.patch, YARN-530-021.patch, YARN-530-022.patch, YARN-530-023.patch, YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call
[ https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-693: --- Attachment: YARN-693-20130610.patch Sending NMToken to AM on allocate call -- Key: YARN-693 URL: https://issues.apache.org/jira/browse/YARN-693 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: YARN-693-20130610.patch This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call
[ https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-693: --- Description: This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. AMRMClient should expose these NMToken to client. was: This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. Sending NMToken to AM on allocate call -- Key: YARN-693 URL: https://issues.apache.org/jira/browse/YARN-693 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: YARN-693-20130610.patch This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. AMRMClient should expose these NMToken to client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681854#comment-13681854 ] Hadoop QA commented on YARN-366: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587528/YARN-366-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1208//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1208//console This message is automatically generated. Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681875#comment-13681875 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587543/YARN-117-023.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 38 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1209//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1209//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1209//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1209//console This message is automatically generated. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-007.patch, YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, YARN-117-022.patch, YARN-117-023.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}}
[jira] [Commented] (YARN-801) Expose container locations and capabilities in the RM REST APIs
[ https://issues.apache.org/jira/browse/YARN-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681878#comment-13681878 ] Junping Du commented on YARN-801: - Sandy, shall we include ContainerState and running task info as well? Expose container locations and capabilities in the RM REST APIs --- Key: YARN-801 URL: https://issues.apache.org/jira/browse/YARN-801 Project: Hadoop YARN Issue Type: Improvement Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza It would be useful to be able to query container allocation info via the RM REST APIs. We should be able to query per application, and for each container we should provide (at least): * location * resource capabilty -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.6.patch CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-117: - Attachment: YARN-117-024.patch Uber patch suppressing findBugs warnings. All the warnings are about fields accessed in the service* methods, which are not synchronized on the objects but should be fine as they are just read and not modified in any of the cases. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-007.patch, YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked
[jira] [Commented] (YARN-789) Enable zero capabilities resource requests in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681907#comment-13681907 ] Hadoop QA commented on YARN-789: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587533/YARN-789.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1211//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1211//console This message is automatically generated. Enable zero capabilities resource requests in fair scheduler Key: YARN-789 URL: https://issues.apache.org/jira/browse/YARN-789 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-789.patch, YARN-789.patch Per discussion in YARN-689, reposting updated use case: 1. I have a set of services co-existing with a Yarn cluster. 2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing. 3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa. By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources. These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping. The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 1d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory). The current limitation is that the increment is also the minimum. If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc). If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster. Finally, on hard enforcement. * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681911#comment-13681911 ] Hadoop QA commented on YARN-569: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587557/YARN-569.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1212//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1212//console This message is automatically generated. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681932#comment-13681932 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587559/YARN-117-024.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 38 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1213//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1213//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1213//console This message is automatically generated. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-007.patch, YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class