[jira] [Commented] (YARN-976) Document the meaning of a virtual core
[ https://issues.apache.org/jira/browse/YARN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790061#comment-13790061 ] Hudson commented on YARN-976: - SUCCESS: Integrated in Hadoop-trunk-Commit #4569 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4569/]) YARN-976. Document the meaning of a virtual core. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530500) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java Document the meaning of a virtual core -- Key: YARN-976 URL: https://issues.apache.org/jira/browse/YARN-976 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-976.patch As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1258) Allow configuring the Fair Scheduler root queue
[ https://issues.apache.org/jira/browse/YARN-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790082#comment-13790082 ] Hadoop QA commented on YARN-1258: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607516/YARN-1258-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2151//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2151//console This message is automatically generated. Allow configuring the Fair Scheduler root queue --- Key: YARN-1258 URL: https://issues.apache.org/jira/browse/YARN-1258 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1258-1.patch, YARN-1258.patch This would be useful for acls, maxRunningApps, scheduling modes, etc. The allocation file should be able to accept both: * An implicit root queue * A root queue at the top of the hierarchy with all queues under/inside of it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-461) Fair scheduler should not accept apps with empty string queue name
[ https://issues.apache.org/jira/browse/YARN-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790096#comment-13790096 ] Hudson commented on YARN-461: - SUCCESS: Integrated in Hadoop-trunk-Commit #4570 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4570/]) Fix position of YARN-461 in CHANGES.txt (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530505) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Fair scheduler should not accept apps with empty string queue name -- Key: YARN-461 URL: https://issues.apache.org/jira/browse/YARN-461 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-461.patch, YARN-461.patch, YARN-461.patch, YARN-461.patch When an app is submitted with for the queue, the RMAppManager passes it on like it does with any other string. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-879: Attachment: YARN-879-v5.patch Sync up patch with recently trunk in v5. Fix tests w.r.t o.a.h.y.server.resourcemanager.Application -- Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, YARN-879-v4.patch, YARN-879-v5.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1258) Allow configuring the Fair Scheduler root queue
[ https://issues.apache.org/jira/browse/YARN-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790188#comment-13790188 ] Hudson commented on YARN-1258: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4571 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4571/]) YARN-1258. Allow configuring the Fair Scheduler root queue (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530542) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Allow configuring the Fair Scheduler root queue --- Key: YARN-1258 URL: https://issues.apache.org/jira/browse/YARN-1258 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1258-1.patch, YARN-1258.patch This would be useful for acls, maxRunningApps, scheduling modes, etc. The allocation file should be able to accept both: * An implicit root queue * A root queue at the top of the hierarchy with all queues under/inside of it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1258) Allow configuring the Fair Scheduler root queue
[ https://issues.apache.org/jira/browse/YARN-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790244#comment-13790244 ] Hudson commented on YARN-1258: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #357 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/357/]) YARN-1258. Allow configuring the Fair Scheduler root queue (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530542) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Allow configuring the Fair Scheduler root queue --- Key: YARN-1258 URL: https://issues.apache.org/jira/browse/YARN-1258 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1258-1.patch, YARN-1258.patch This would be useful for acls, maxRunningApps, scheduling modes, etc. The allocation file should be able to accept both: * An implicit root queue * A root queue at the top of the hierarchy with all queues under/inside of it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-461) Fair scheduler should not accept apps with empty string queue name
[ https://issues.apache.org/jira/browse/YARN-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790241#comment-13790241 ] Hudson commented on YARN-461: - SUCCESS: Integrated in Hadoop-Yarn-trunk #357 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/357/]) Fix position of YARN-461 in CHANGES.txt (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530505) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Fair scheduler should not accept apps with empty string queue name -- Key: YARN-461 URL: https://issues.apache.org/jira/browse/YARN-461 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-461.patch, YARN-461.patch, YARN-461.patch, YARN-461.patch When an app is submitted with for the queue, the RMAppManager passes it on like it does with any other string. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-976) Document the meaning of a virtual core
[ https://issues.apache.org/jira/browse/YARN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790245#comment-13790245 ] Hudson commented on YARN-976: - SUCCESS: Integrated in Hadoop-Yarn-trunk #357 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/357/]) YARN-976. Document the meaning of a virtual core. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530500) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java Document the meaning of a virtual core -- Key: YARN-976 URL: https://issues.apache.org/jira/browse/YARN-976 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-976.patch As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790243#comment-13790243 ] Hudson commented on YARN-1284: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #357 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/357/]) Add missing file TestCgroupsLCEResourcesHandler for YARN-1284. (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530493) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java YARN-1284. LCE: Race condition leaves dangling cgroups entries for killed containers. (Alejandro Abdelnur via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530492) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java LCE: Race condition leaves dangling cgroups entries for killed containers - Key: YARN-1284 URL: https://issues.apache.org/jira/browse/YARN-1284 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.3.0 Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch When LCE cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like: {code} 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_11 is : 143 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_11 of type UPDATE_DIAGNOSTICS_MSG 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 {code} CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers. Still, waiting for extra 500ms seems too expensive. We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790296#comment-13790296 ] Hadoop QA commented on YARN-879: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607527/YARN-879-v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2152//console This message is automatically generated. Fix tests w.r.t o.a.h.y.server.resourcemanager.Application -- Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, YARN-879-v4.patch, YARN-879-v5.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-976) Document the meaning of a virtual core
[ https://issues.apache.org/jira/browse/YARN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790322#comment-13790322 ] Hudson commented on YARN-976: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1573 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1573/]) YARN-976. Document the meaning of a virtual core. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530500) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java Document the meaning of a virtual core -- Key: YARN-976 URL: https://issues.apache.org/jira/browse/YARN-976 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-976.patch As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790320#comment-13790320 ] Hudson commented on YARN-1284: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1573 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1573/]) Add missing file TestCgroupsLCEResourcesHandler for YARN-1284. (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530493) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java YARN-1284. LCE: Race condition leaves dangling cgroups entries for killed containers. (Alejandro Abdelnur via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530492) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java LCE: Race condition leaves dangling cgroups entries for killed containers - Key: YARN-1284 URL: https://issues.apache.org/jira/browse/YARN-1284 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.3.0 Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch When LCE cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like: {code} 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_11 is : 143 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_11 of type UPDATE_DIAGNOSTICS_MSG 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 {code} CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers. Still, waiting for extra 500ms seems too expensive. We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1258) Allow configuring the Fair Scheduler root queue
[ https://issues.apache.org/jira/browse/YARN-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790321#comment-13790321 ] Hudson commented on YARN-1258: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1573 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1573/]) YARN-1258. Allow configuring the Fair Scheduler root queue (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530542) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Allow configuring the Fair Scheduler root queue --- Key: YARN-1258 URL: https://issues.apache.org/jira/browse/YARN-1258 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1258-1.patch, YARN-1258.patch This would be useful for acls, maxRunningApps, scheduling modes, etc. The allocation file should be able to accept both: * An implicit root queue * A root queue at the top of the hierarchy with all queues under/inside of it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790337#comment-13790337 ] Junping Du commented on YARN-879: - It is strange that v5 patch are built and passed tests well in local. Above Jenkins build failure didn't provide any hints so just suspect something wrong with Jenkins job. Cancel the patch and resubmit patch again. Fix tests w.r.t o.a.h.y.server.resourcemanager.Application -- Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, YARN-879-v4.patch, YARN-879-v5.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-879: Attachment: YARN-879-v5.1.patch rename v5 patch to v5.1 (exactly the same) for kicking out jenkins. Fix tests w.r.t o.a.h.y.server.resourcemanager.Application -- Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-461) Fair scheduler should not accept apps with empty string queue name
[ https://issues.apache.org/jira/browse/YARN-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790345#comment-13790345 ] Hudson commented on YARN-461: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1547 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1547/]) Fix position of YARN-461 in CHANGES.txt (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530505) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Fair scheduler should not accept apps with empty string queue name -- Key: YARN-461 URL: https://issues.apache.org/jira/browse/YARN-461 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-461.patch, YARN-461.patch, YARN-461.patch, YARN-461.patch When an app is submitted with for the queue, the RMAppManager passes it on like it does with any other string. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790347#comment-13790347 ] Hudson commented on YARN-1284: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1547 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1547/]) Add missing file TestCgroupsLCEResourcesHandler for YARN-1284. (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530493) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java YARN-1284. LCE: Race condition leaves dangling cgroups entries for killed containers. (Alejandro Abdelnur via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530492) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java LCE: Race condition leaves dangling cgroups entries for killed containers - Key: YARN-1284 URL: https://issues.apache.org/jira/browse/YARN-1284 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.3.0 Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch When LCE cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like: {code} 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_11 is : 143 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_11 of type UPDATE_DIAGNOSTICS_MSG 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 {code} CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers. Still, waiting for extra 500ms seems too expensive. We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1258) Allow configuring the Fair Scheduler root queue
[ https://issues.apache.org/jira/browse/YARN-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790348#comment-13790348 ] Hudson commented on YARN-1258: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1547 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1547/]) YARN-1258. Allow configuring the Fair Scheduler root queue (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530542) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Allow configuring the Fair Scheduler root queue --- Key: YARN-1258 URL: https://issues.apache.org/jira/browse/YARN-1258 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1258-1.patch, YARN-1258.patch This would be useful for acls, maxRunningApps, scheduling modes, etc. The allocation file should be able to accept both: * An implicit root queue * A root queue at the top of the hierarchy with all queues under/inside of it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-976) Document the meaning of a virtual core
[ https://issues.apache.org/jira/browse/YARN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790349#comment-13790349 ] Hudson commented on YARN-976: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1547 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1547/]) YARN-976. Document the meaning of a virtual core. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530500) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java Document the meaning of a virtual core -- Key: YARN-976 URL: https://issues.apache.org/jira/browse/YARN-976 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-976.patch As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-879) Fix tests w.r.t o.a.h.y.server.resourcemanager.Application
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790363#comment-13790363 ] Hadoop QA commented on YARN-879: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607557/YARN-879-v5.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2153//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2153//console This message is automatically generated. Fix tests w.r.t o.a.h.y.server.resourcemanager.Application -- Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch, YARN-879-v2.patch, YARN-879-v3.patch, YARN-879-v4.patch, YARN-879-v5.1.patch, YARN-879-v5.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790555#comment-13790555 ] Robert Joseph Evans commented on YARN-321: -- I like the diagrams, but I want to understand if the generic application history service is intended to replace the job history server, or to just augment it? I would prefer it if we could replace the current server. Perhaps not in the first release, but eventually. To make that work we would have to provide a way for MR specific code to come up and run inside the service, exposing both the current restful web service, an application specific UI, and the RPC server that we currently run. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790604#comment-13790604 ] Zhijie Shen commented on YARN-321: -- bq. I like the diagrams, but I want to understand if the generic application history service is intended to replace the job history server, or to just augment it? Yes, there's some previous discussion on recording the per application type history data, but we plan to exclude it from the initial version of AHS. Eventually, we'd like to integrate JHS into AHS in some way, the details of which we could discuss in the follow jiras. To me, it would be better if we can design a common per application type plugin framework, such that we can easily integrate the HS of other applications on YARN. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1284: - Fix Version/s: (was: 2.3.0) 2.2.1 committed to branch-2.2. LCE: Race condition leaves dangling cgroups entries for killed containers - Key: YARN-1284 URL: https://issues.apache.org/jira/browse/YARN-1284 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.2.1 Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch When LCE cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like: {code} 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_11 is : 143 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_11 of type UPDATE_DIAGNOSTICS_MSG 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 {code} CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers. Still, waiting for extra 500ms seems too expensive. We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1284) LCE: Race condition leaves dangling cgroups entries for killed containers
[ https://issues.apache.org/jira/browse/YARN-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790628#comment-13790628 ] Hudson commented on YARN-1284: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4574 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4574/]) Amending yarn CHANGES.txt moving YARN-1284 to 2.2.1 (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530716) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt LCE: Race condition leaves dangling cgroups entries for killed containers - Key: YARN-1284 URL: https://issues.apache.org/jira/browse/YARN-1284 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.2.1 Attachments: YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch, YARN-1284.patch When LCE cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like: {code} 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_11 is : 143 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_11 of type UPDATE_DIAGNOSTICS_MSG 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_11 {code} CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers. Still, waiting for extra 500ms seems too expensive. We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-913: - Attachment: RegistrationServiceDetails.txt Uploading a file that shows some examples of the registration service APIs. Any feedback on them is appreciated. Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Robert Joseph Evans Attachments: RegistrationServiceDetails.txt In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1288) Fair Scheduler child queue ACLs shouldn't give everyone access by default
Sandy Ryza created YARN-1288: Summary: Fair Scheduler child queue ACLs shouldn't give everyone access by default Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790668#comment-13790668 ] Omkar Vinit Joshi commented on YARN-1283: - wonder why jenkin didn't report this failure earlier...fixing it. Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY - Key: YARN-1283 URL: https://issues.apache.org/jira/browse/YARN-1283 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Labels: newbie Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, YARN-1283.20131008.2.patch, YARN-1283.3.patch After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect The url to track the job. Currently, its printing http://RM:httpsport/proxy/application_1381162886563_0001/ instead https://RM:httpsport/proxy/application_1381162886563_0001/ http://hostname:8088/proxy/application_1381162886563_0001/ is invalid hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1381162886563_0001 13/10/07 18:39:40 INFO impl.YarnClientImpl: Submitted application application_1381162886563_0001 to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1381162886563_0001/ 13/10/07 18:39:40 INFO mapreduce.Job: Running job: job_1381162886563_0001 13/10/07 18:39:46 INFO mapreduce.Job: Job job_1381162886563_0001 running in uber mode : false 13/10/07 18:39:46 INFO mapreduce.Job: map 0% reduce 0% 13/10/07 18:39:53 INFO mapreduce.Job: map 100% reduce 0% 13/10/07 18:39:58 INFO mapreduce.Job: map 100% reduce 100% 13/10/07 18:39:58 INFO mapreduce.Job: Job job_1381162886563_0001 completed successfully 13/10/07 18:39:58 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=26 FILE: Number of bytes written=177279 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=48 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0
[jira] [Updated] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1283: Attachment: YARN-1283.3.patch Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY - Key: YARN-1283 URL: https://issues.apache.org/jira/browse/YARN-1283 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Labels: newbie Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, YARN-1283.20131008.2.patch, YARN-1283.3.patch After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect The url to track the job. Currently, its printing http://RM:httpsport/proxy/application_1381162886563_0001/ instead https://RM:httpsport/proxy/application_1381162886563_0001/ http://hostname:8088/proxy/application_1381162886563_0001/ is invalid hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1381162886563_0001 13/10/07 18:39:40 INFO impl.YarnClientImpl: Submitted application application_1381162886563_0001 to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1381162886563_0001/ 13/10/07 18:39:40 INFO mapreduce.Job: Running job: job_1381162886563_0001 13/10/07 18:39:46 INFO mapreduce.Job: Job job_1381162886563_0001 running in uber mode : false 13/10/07 18:39:46 INFO mapreduce.Job: map 0% reduce 0% 13/10/07 18:39:53 INFO mapreduce.Job: map 100% reduce 0% 13/10/07 18:39:58 INFO mapreduce.Job: map 100% reduce 100% 13/10/07 18:39:58 INFO mapreduce.Job: Job job_1381162886563_0001 completed successfully 13/10/07 18:39:58 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=26 FILE: Number of bytes written=177279 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=48 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Launched reduce
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Attachment: yarn-1068-10.patch Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790737#comment-13790737 ] Karthik Kambatla commented on YARN-1068: Thanks [~bikassaha] for the detailed review. Sorry for the delay in responding, was caught up with some other issues. Uploaded a patch that addresses most of the comments: # Audit logging in RMHAProtocolService # Update AdminService also to use RMServerUtils#verifyAccess # YarnConfiguration overrides updateConnectAddr as well to complement getSocketAddr # Change argument names from nodeId to rmId bq. cast not needed right? Currently, RMHAServiceTarget constructor takes YarnConfiguration as an argument and not Configuration. We can change this, but I think it is better to be explicit in the kind of instance needed. Leaving the constructor as is requires the cast. bq. Should this be RMHAServiceProtocol address? Admins and ZKFC would be connecting on this protocol right? Updated the description to reflect that the RM listens at that address for both rmhaadmin CLI and the failover controller (ZKFC). I think we should leave the config name as ha.admin.address for the following reasons: (1) the user/admin understand admin better than HAProtocolService as the latter requires them to understand the protocol being used, (2) either CLI or ZKFC or actually administrating the HA state of the RM, (3) we don't use protocol names anywhere else in the configs. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1279: -- Issue Type: Sub-task (was: Bug) Parent: YARN-431 Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Arun C Murthy Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1085) Yarn and MRv2 should do HTTP client authentication in kerberos setup.
[ https://issues.apache.org/jira/browse/YARN-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1085: -- Issue Type: Sub-task (was: Task) Parent: YARN-47 Yarn and MRv2 should do HTTP client authentication in kerberos setup. - Key: YARN-1085 URL: https://issues.apache.org/jira/browse/YARN-1085 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Jaimin D Jetly Assignee: Omkar Vinit Joshi Priority: Blocker Labels: security Fix For: 2.1.1-beta Attachments: YARN-1085.20130820.1.patch, YARN-1085.20130823.1.patch, YARN-1085.20130823.2.patch, YARN-1085.20130823.3.patch, YARN-1085.20130825.1.add.patch, YARN-1085.20130826.1.add.patch In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790767#comment-13790767 ] Hadoop QA commented on YARN-1068: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607623/yarn-1068-10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2155//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2155//console This message is automatically generated. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790784#comment-13790784 ] Hadoop QA commented on YARN-1283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607608/YARN-1283.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestJobCleanup The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.TestUberAM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2154//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2154//console This message is automatically generated. Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY - Key: YARN-1283 URL: https://issues.apache.org/jira/browse/YARN-1283 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Labels: newbie Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, YARN-1283.20131008.2.patch, YARN-1283.3.patch After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect The url to track the job. Currently, its printing http://RM:httpsport/proxy/application_1381162886563_0001/ instead https://RM:httpsport/proxy/application_1381162886563_0001/ http://hostname:8088/proxy/application_1381162886563_0001/ is invalid hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated.
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790846#comment-13790846 ] Karthik Kambatla commented on YARN-1241: Good to see we are separating out runnable and non-runnable apps - keen to see how much this improves the schedule-time overhead. Comments below: {code} + private final ListAppSchedulable appScheds = // apps that are runnable {code} How about renaming this field to runnableAppScheds? {code} + public void makeAppRunnable(AppSchedulable appSched) { {code} Should we make this a synchronized method along with other accessors of runnable and non-runnable appScheds? In case, we make the scheduler multi-threaded? {code} if (Resources.equals(demand, maxRes)) { break; } {code} Making a mental note. The assumption here is that we don't use demand to determine fairness between schedulables. Correct? {code} for (AppSchedulable sched : nonRunnableAppScheds) { if (Resources.equals(demand, maxRes)) { break; } updateDemandForApp(sched, maxRes); } {code} Should we even consider the demand for non-runnable schedulables? Would it make sense to ignore the non-runnable schedulables for all book-keeping until they become runnable? In FairScheduler, the PriorityQueue import looks spurious. MaxRunningAppsEnforcer should be singleton? {code} private ListMultimapString, AppSchedulable usersNonRunnableApps; {code} I am assuming the ListMultimap choice is to maintain the ordering and give preference to the earliest submitted job within a queue. Wondering if it would make sense to use any other metric than submit time for improved fairness? {code} * Runs in O(n log(n)) where n is the number of queues under that are under * the highest queue that went from having no slack to having slack. {code} The comment should have a single under? {code} // Update runnable app bookkeeping for queues // Find the queue highest in the tree which the app removal means something // new can be run FSQueue highestQueueAppsNowRunnable = (queue.getNumRunnableApps() == queueMgr.getQueueMaxApps(queue.getName()) - 1) ? queue : null; FSParentQueue parent = queue.getParent(); while (parent != null) { if (parent.getNumRunnableApps() == queueMgr.getQueueMaxApps(parent.getName())) { highestQueueAppsNowRunnable = parent; } parent.decrementRunnableApps(); parent = parent.getParent(); } {code} The comment is a little confusing. Rename highestQueueAppsNowRunnable to highestQueueWithAppsNowRunnable? Also, I am a little confused here. IIUC, we are trying to check if the completing of this app makes any other applications runnable now? Shouldn't we break out of the loop as soon as we encounter a queue for with zero non-runnable apps? Also, would be nice to have separate unit tests for MaxRunningAppsEnforcer. In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790853#comment-13790853 ] Luke Lu commented on YARN-7: The latest patch lgtm. +1. Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Arun C Murthy Assignee: Junping Du Labels: patch Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch, YARN-7-v4.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790879#comment-13790879 ] Sandy Ryza commented on YARN-1241: -- Thanks for the review Karthik. Will post a rebased patch on latest trunk with your suggestions except where discussed below. bq. Should we make this a synchronized method along with other accessors of runnable and non-runnable appScheds? In case, we make the scheduler multi-threaded? I think you're right that this will ultimately need to be synchronized when we remove the super-coarse synchronization that we have currently. But, as a bunch of methods in similar situations currently aren't synchronized, I'd rather do it in a separate JIRA where we can take this on all at once. bq. Making a mental note. The assumption here is that we don't use demand to determine fairness between schedulables. Correct? Correct bq. Should we even consider the demand for non-runnable schedulables? My thinking I think it's more useful for a cluster operator to know about the total amount of resources demanded by all jobs submitted to the cluster. E.g. someone might look at a queue's demand because they want to know whether anybody in it is waiting for anything, and leaving out the non-runnable apps would mask this from them. bq. MaxRunningAppsEnforcer should be singleton? As there's only a single FairScheduler per RM, there will only end up being single instance. If for some reason we wanted to run multiple FairSchedulers in a process (for testing maybe), we would need a MaxRunningAppsEnforcer for each. bq. I am assuming the ListMultimap choice is to maintain the ordering and give preference to the earliest submitted job within a queue. Wondering if it would make sense to use any other metric than submit time for improved fairness? Right. Definitely sounds conceivable to me that we might want to order them by something else in the future, but I don't we need to make the data structure more general until then. bq. IIUC, we are trying to check if the completing of this app makes any other applications runnable now? Shouldn't we break out of the loop as soon as we encounter a queue with zero non-runnable apps? Yeah that's what we are trying to check. childqueueX might have no pending apps itself, but if a queue higher up in the hierarchy parentqueueY has a maxRunningApps set, an app completion in childqueueX could allow an app in some other distant child of parentqueueY to become runnable. I'll update the comment to try to explain a little more of what's being done. bq. Also, would be nice to have separate unit tests for MaxRunningAppsEnforcer. You're right. I will add a couple in. In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790895#comment-13790895 ] Sangjin Lee commented on YARN-451: -- I agree. Even for mappers and reducers container resource asks may vary, and the memory and the cores contain more information than the number of containers. Add more metrics to RM page --- Key: YARN-451 URL: https://issues.apache.org/jira/browse/YARN-451 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Sangjin Lee Priority: Blocker Attachments: in_progress_2x.png, yarn-451-trunk-20130916.1.patch ResourceManager webUI shows list of RUNNING applications, but it does not tell which applications are requesting more resource compared to others. With cluster running hundreds of applications at once it would be useful to have some kind of metric to show high-resource usage applications vs low-resource usage ones. At the minimum showing number of containers is good option. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1125) Add shutdown support to non-service RM components
[ https://issues.apache.org/jira/browse/YARN-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1125: Assignee: Tsuyoshi OZAWA Add shutdown support to non-service RM components - Key: YARN-1125 URL: https://issues.apache.org/jira/browse/YARN-1125 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA The ResourceManager has certain non-service components like the Scheduler. While transitioning to standby, these components should be completely turned off. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790935#comment-13790935 ] Vinod Kumar Vavilapalli commented on YARN-1283: --- +1, looks good. Checking this in. Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY - Key: YARN-1283 URL: https://issues.apache.org/jira/browse/YARN-1283 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Labels: newbie Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, YARN-1283.20131008.2.patch, YARN-1283.3.patch After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect The url to track the job. Currently, its printing http://RM:httpsport/proxy/application_1381162886563_0001/ instead https://RM:httpsport/proxy/application_1381162886563_0001/ http://hostname:8088/proxy/application_1381162886563_0001/ is invalid hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1381162886563_0001 13/10/07 18:39:40 INFO impl.YarnClientImpl: Submitted application application_1381162886563_0001 to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1381162886563_0001/ 13/10/07 18:39:40 INFO mapreduce.Job: Running job: job_1381162886563_0001 13/10/07 18:39:46 INFO mapreduce.Job: Job job_1381162886563_0001 running in uber mode : false 13/10/07 18:39:46 INFO mapreduce.Job: map 0% reduce 0% 13/10/07 18:39:53 INFO mapreduce.Job: map 100% reduce 0% 13/10/07 18:39:58 INFO mapreduce.Job: map 100% reduce 100% 13/10/07 18:39:58 INFO mapreduce.Job: Job job_1381162886563_0001 completed successfully 13/10/07 18:39:58 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=26 FILE: Number of bytes written=177279 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=48 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters
[jira] [Commented] (YARN-1283) Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY
[ https://issues.apache.org/jira/browse/YARN-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790970#comment-13790970 ] Hudson commented on YARN-1283: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4577 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4577/]) YARN-1283. Fixed RM to give a fully-qualified proxy URL for an application so that clients don't need to do scheme-mangling. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1530819) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY - Key: YARN-1283 URL: https://issues.apache.org/jira/browse/YARN-1283 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Labels: newbie Fix For: 2.2.1 Attachments: YARN-1283.20131007.1.patch, YARN-1283.20131008.1.patch, YARN-1283.20131008.2.patch, YARN-1283.3.patch After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect The url to track the job. Currently, its printing http://RM:httpsport/proxy/application_1381162886563_0001/ instead https://RM:httpsport/proxy/application_1381162886563_0001/ http://hostname:8088/proxy/application_1381162886563_0001/ is invalid hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/10/07 18:39:40 INFO Configuration.deprecation:
[jira] [Updated] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1241: - Attachment: YARN-1241-4.patch In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1241: - Attachment: YARN-1241-5.patch In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791010#comment-13791010 ] Hadoop QA commented on YARN-1241: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607682/YARN-1241-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2156//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2156//console This message is automatically generated. In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-612) Cleanup BuilderUtils
[ https://issues.apache.org/jira/browse/YARN-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-612: - Assignee: (was: Karthik Kambatla) Cleanup BuilderUtils Key: YARN-612 URL: https://issues.apache.org/jira/browse/YARN-612 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: yarn-612-1.patch, yarn-612-2.patch There's 4 different methods to create ApplicationId. There's likely other such methods as well which could be consolidated. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-1172: -- Assignee: (was: Karthik Kambatla) Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791022#comment-13791022 ] Hadoop QA commented on YARN-1241: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607693/YARN-1241-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2157//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2157//console This message is automatically generated. In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1288) Fair Scheduler child queue ACLs shouldn't give everyone access by default
[ https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1288: - Attachment: YARN-1288.patch Fair Scheduler child queue ACLs shouldn't give everyone access by default - Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1288.patch The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791065#comment-13791065 ] Karthik Kambatla commented on YARN-1068: Forgot to mention the testing done. Ran a two node cluster - each node running an RM, and used the CLI to transition each to active/standby, getServiceState, checkHealth several (~100) times for various versions of the patch put together. Verified the latest patch on a pseudo-dist cluster with configs for both RMs pointing to the same RM. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1125) Add shutdown support to non-service RM components
[ https://issues.apache.org/jira/browse/YARN-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1125: - Assignee: (was: Tsuyoshi OZAWA) Add shutdown support to non-service RM components - Key: YARN-1125 URL: https://issues.apache.org/jira/browse/YARN-1125 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla The ResourceManager has certain non-service components like the Scheduler. While transitioning to standby, these components should be completely turned off. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1139: Assignee: Tsuyoshi OZAWA [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1288) Make Fair Scheduler ACLs more use friendly
[ https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1288: - Description: The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. was:The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. Make Fair Scheduler ACLs more use friendly -- Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1288.patch The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1288) Make Fair Scheduler ACLs more user friendly
[ https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1288: - Description: The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. We should also not trim the acl strings, which makes it impossible to only specify groups in an acl. was: The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. Make Fair Scheduler ACLs more user friendly --- Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1288.patch The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. We should also not trim the acl strings, which makes it impossible to only specify groups in an acl. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1288) Make Fair Scheduler ACLs more user friendly
[ https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1288: - Summary: Make Fair Scheduler ACLs more user friendly (was: Make Fair Scheduler ACLs more use friendly) Make Fair Scheduler ACLs more user friendly --- Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1288.patch The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly
[ https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791098#comment-13791098 ] Sandy Ryza commented on YARN-1288: -- Attached patch that makes the changes discussed above. To avoid allocating and filling a HashMap every time acls are checked, the patch also changes QueueManager#getQueueAcls to QueueManager#getQueueAcl and removes the getQueueAcls method in Queue that is no longer needed because of this. Make Fair Scheduler ACLs more user friendly --- Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1288.patch The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. We should also not trim the acl strings, which makes it impossible to only specify groups in an acl. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly
[ https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791116#comment-13791116 ] Hadoop QA commented on YARN-1288: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607708/YARN-1288.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2158//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2158//console This message is automatically generated. Make Fair Scheduler ACLs more user friendly --- Key: YARN-1288 URL: https://issues.apache.org/jira/browse/YARN-1288 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1288.patch The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to *. Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. We should also not trim the acl strings, which makes it impossible to only specify groups in an acl. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1289) Better to configure yarn.nodemanager.aux-services value in yarn-site.xml
wenwupeng created YARN-1289: --- Summary: Better to configure yarn.nodemanager.aux-services value in yarn-site.xml Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1289: - Summary: Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. (was: Better to configure yarn.nodemanager.aux-services value in yarn-site.xml) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791135#comment-13791135 ] Junping Du commented on YARN-1289: -- Thanks Wenwu for finding this. I think this is a bug we should fix as mapreduce_shuffle is something default. Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.
[ https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1289: - Attachment: YARN-1289.patch Attach a quick patch. Fix it by getting mapreduce_shuffle when missing this property in yarn configuration file. Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle. -- Key: YARN-1289 URL: https://issues.apache.org/jira/browse/YARN-1289 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: wenwupeng Assignee: Junping Du Attachments: YARN-1289.patch Failed to run benchmark when not configure yarn.nodemanager.aux-services value in yarn-site.xml', it is better to configure default value. 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : attempt_1381371516570_0001_m_00_1, Status : FAILED Container launch failed for container_1381371516570_0001_01_05 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)