[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599778#comment-13599778 ] Hadoop QA commented on YARN-378: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573265/YARN-378_6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/503//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/503//console This message is automatically generated. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
Aleksey Gorshkov created YARN-468: - Summary: coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-468: -- Attachment: YARN-468-trunk.patch coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter - Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-468: -- Description: coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 was:coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter - Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600180#comment-13600180 ] Hadoop QA commented on YARN-18: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573355/YARN-18-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 one of tests included doesn't have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 9 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/505//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/505//console This message is automatically generated. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages
[ https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599985#comment-13599985 ] Abhishek Kapoor commented on YARN-379: -- Are we okay with the fix ? Please suggest Thanks Abhi yarn [node,application] command print logger info messages -- Key: YARN-379 URL: https://issues.apache.org/jira/browse/YARN-379 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Abhishek Kapoor Labels: usability Attachments: YARN-379.patch Running the yarn node and yarn applications command results in annoying log info messages being printed: $ yarn node -list 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Total Nodes:1 Node-IdNode-State Node-Http-Address Health-Status(isNodeHealthy)Running-Containers foo:8041RUNNING foo:8042 true 0 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. $ yarn application 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Invalid Command Usage : usage: application -kill arg Kills the application. -list Lists all the Applications from RM. -status arg Prints the status of the application. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: YARN-18-v4.patch Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600182#comment-13600182 ] Hadoop QA commented on YARN-460: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573361/YARN-460.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/506//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/506//console This message is automatically generated. CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Attachment: YARN-460.patch trunk and branch-2 patch. Unfortunately I couldn't easily come up with a unit test to hit the application stopped condition (without hitting the null check) due to the data structures being private. CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600171#comment-13600171 ] Junping Du commented on YARN-18: Thanks Luke for your review and comments! I addressed almost all your points in V4 patch except for putting ScheduledRequests in topology related factory class for YARN as I think this is the object created within MRAppMaster but not YARN ResourceManager like other objects so in different package. Am I missing something? Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash reassigned YARN-200: - Assignee: Ravi Prakash yarn log does not output all needed information, and is in a binary format -- Key: YARN-200 URL: https://issues.apache.org/jira/browse/YARN-200 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Assignee: Ravi Prakash Labels: usability yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600169#comment-13600169 ] Ted Yu commented on YARN-449: - On flubber, I installed protoc 2.4.1 but couldn't use it: {code} $ protoc --version protoc: error while loading shared libraries: libprotobuf.so.7: cannot open shared object file: No such file or directory {code} I applied minimr_randomdir-branch2.txt locally and ran the following command: mt -Dhadoop.profile=2.0 -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce#testMultiRegionTable The test passed. MRAppMaster classpath not set properly for unit tests in downstream projects Key: YARN-449 URL: https://issues.apache.org/jira/browse/YARN-449 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Priority: Blocker Attachments: 7904-v5.txt, hbase-7904-v3.txt, hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, minimr_randomdir-branch2.txt Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600216#comment-13600216 ] Hadoop QA commented on YARN-460: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573361/YARN-460.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/507//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/507//console This message is automatically generated. CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600232#comment-13600232 ] Thomas Graves commented on YARN-460: Also note that I manually tested this by delaying the kill container message going to AM and sleeping between when the app is marked done and before it was removed from the application list. I was able to reproduce the issue, then saw this patch fix it and the AM get the reboot command. I tested both capacity scheduler and fifo. CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600238#comment-13600238 ] Hadoop QA commented on YARN-382: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572843/YARN-382_1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/508//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/508//console This message is automatically generated. SchedulerUtils improve way normalizeRequest sets the resource capabilities -- Key: YARN-382 URL: https://issues.apache.org/jira/browse/YARN-382 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Attachments: YARN-382_1.patch, YARN-382_demo.patch In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when directories in file cache reaches to unix directory limit
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600312#comment-13600312 ] omkar vinit joshi commented on YARN-99: --- I am creating a yarn-467 for public cache issue. Private cache fix will be committed here. Jobs fail during resource localization when directories in file cache reaches to unix directory limit - Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Devaraj K If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-459) DefaultContainerExecutor doesn't log stderr from container launch
[ https://issues.apache.org/jira/browse/YARN-459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-459: --- Assignee: Sandy Ryza DefaultContainerExecutor doesn't log stderr from container launch - Key: YARN-459 URL: https://issues.apache.org/jira/browse/YARN-459 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.7 Reporter: Thomas Graves Assignee: Sandy Ryza The DefaultContainerExecutor does not log stderr or add it to the diagnostics message it something fails during the container launch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600416#comment-13600416 ] Ted Yu commented on YARN-449: - From https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/442/testReport/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColumn/ (hadoop-2.0.2-alpha was used): {code} 2013-03-12 05:44:18,139 WARN [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_0/usercache/jenkins/appcache/application_1363067018215_0001/container_1363067018215_0001_01_01] 2013-03-12 05:44:18,145 WARN [AsyncDispatcher event handler] nodemanager.NMAuditLogger(150): USER=jenkins OPERATION=Container Finished - Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILEDAPPID=application_1363067018215_0001 CONTAINERID=container_1363067018215_0001_01_01 2013-03-12 05:44:18,141 WARN [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_1/usercache/jenkins/appcache/application_1363067018215_0001/container_1363067018215_0001_01_01] 2013-03-12 05:44:18,220 WARN [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_2/usercache/jenkins/appcache/application_1363067018215_0001/container_1363067018215_0001_01_01] 2013-03-12 05:44:18,220 WARN [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_3/usercache/jenkins/appcache/application_1363067018215_0001/container_1363067018215_0001_01_01] 2013-03-12 05:44:18,865 WARN [AsyncDispatcher event handler] resourcemanager.RMAuditLogger(255): USER=jenkins OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1363067018215_0001 failed 1 times due to AM Container for appattempt_1363067018215_0001_01 exited with exitCode: -1000 due to: RemoteTrace: java.io.IOException: Unable to rename file: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_1/usercache/jenkins/filecache/5596410335910248146_tmp/hadoop-262140332608909552.jar.tmp] to [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_1/usercache/jenkins/filecache/5596410335910248146_tmp/hadoop-262140332608909552.jar] at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:162) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:205) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unable to rename file: [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_1/usercache/jenkins/filecache/5596410335910248146_tmp/hadoop-262140332608909552.jar.tmp] to [/home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-2.0.0/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-1_1/usercache/jenkins/filecache/5596410335910248146_tmp/hadoop-262140332608909552.jar] at
[jira] [Created] (YARN-469) Make scheduling mode in FS pluggable
Karthik Kambatla created YARN-469: - Summary: Make scheduling mode in FS pluggable Key: YARN-469 URL: https://issues.apache.org/jira/browse/YARN-469 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-459) DefaultContainerExecutor doesn't log stderr from container launch
[ https://issues.apache.org/jira/browse/YARN-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600457#comment-13600457 ] Sandy Ryza commented on YARN-459: - Currently for MR both stderr and stdout are redirected to files, so they contain nothing useful. Would it make sense to send the output streams both to these files and to the console (using tee)? DefaultContainerExecutor doesn't log stderr from container launch - Key: YARN-459 URL: https://issues.apache.org/jira/browse/YARN-459 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.7 Reporter: Thomas Graves Assignee: Sandy Ryza The DefaultContainerExecutor does not log stderr or add it to the diagnostics message it something fails during the container launch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-459) DefaultContainerExecutor doesn't log stderr from container launch
[ https://issues.apache.org/jira/browse/YARN-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600464#comment-13600464 ] Sandy Ryza commented on YARN-459: - Or alternatively, make these files standard for all yarn apps so that the container executor can read info from them? DefaultContainerExecutor doesn't log stderr from container launch - Key: YARN-459 URL: https://issues.apache.org/jira/browse/YARN-459 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.7 Reporter: Thomas Graves Assignee: Sandy Ryza The DefaultContainerExecutor does not log stderr or add it to the diagnostics message it something fails during the container launch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600578#comment-13600578 ] Siddharth Seth commented on YARN-449: - {code} 013-03-12 18:53:39,275 WARN [Container Monitor] monitor.ContainersMonitorImpl$MonitoringThread(444): Container [pid=8438,containerID=container_1363114400920_0001_01_01] is running beyond virtual memory limits. Current usage: 217.9 MB of 2 GB physical memory used; 6.5 GB of 4.2 GB virtual memory used. Killing container. Dump of the process-tree for container_1363114400920_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 8438 7023 8438 8438 (bash) 1 0 108650496 310 /bin/bash -c /usr/lib/jvm/java-1.6.0-sun-1.6.0.37.x86_64/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1gt;/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01/stdout 2gt;/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01/stderr {code} This is what caused TestRowCounter to fail in the linux env. Not sure why the Vmem is going that high. The hadoop-1 default config likely disables this monitoring. At this point there seem to be solutions for the original prolem the jira was opened for, and this is really re-purposed to get HBase unit tests working with Hadoop 2. Changing the title accordingly. MRAppMaster classpath not set properly for unit tests in downstream projects Key: YARN-449 URL: https://issues.apache.org/jira/browse/YARN-449 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Priority: Blocker Attachments: 7904-v5.txt, hbase-7904-v3.txt, hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, minimr_randomdir-branch2.txt Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-449) HBase test failures when running against Hadoop 2
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-449: Summary: HBase test failures when running against Hadoop 2 (was: MRAppMaster classpath not set properly for unit tests in downstream projects) HBase test failures when running against Hadoop 2 - Key: YARN-449 URL: https://issues.apache.org/jira/browse/YARN-449 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Priority: Blocker Attachments: 7904-v5.txt, hbase-7904-v3.txt, hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, minimr_randomdir-branch2.txt Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-71: -- Attachment: YARN-71.7.patch 1.Reuse the FileContxt instance 2.put rename/exception code into a new function. 3.create new test java file , testNodeManagerReboot.java, and add new test case Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-71: -- Attachment: (was: YARN-71.7.patch) Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-71: -- Attachment: YARN-71.7.patch Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600599#comment-13600599 ] Hadoop QA commented on YARN-71: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573437/YARN-71.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/509//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/509//console This message is automatically generated. Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-412: --- Assignee: Roger Hoover FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch, YARN-412.patch, YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-466) Slave hostname mismatches in ResourceManager/Scheduler
[ https://issues.apache.org/jira/browse/YARN-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-466: Assignee: Zhijie Shen Slave hostname mismatches in ResourceManager/Scheduler -- Key: YARN-466 URL: https://issues.apache.org/jira/browse/YARN-466 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Roger Hoover Assignee: Zhijie Shen The problem is that the ResourceManager learns the hostname of a slave node when the NodeManager registers itself and it seems the node manager is getting the hostname by asking the OS. When a job is submitted, I think the ApplicationMaster learns the hostname by doing a reverse DNS lookup based on the slaves file. Therefore, the ApplicationMaster submits requests for containers using the fully qualified domain name (node1.foo.com) but the scheduler uses the OS hostname (node1) when checking to see if any requests are node-local. The result is that node-local requests are never found using this method of searching for node-local requests: ResourceRequest request = application.getResourceRequest(priority, node.getHostName()); I think it's unfriendly to ask users to make sure they configure hostnames to match fully qualified domain names. There should be a way for the ApplicationMaster and NodeManager to agree on the hostname. Steps to Reproduce: 1) Configure the OS hostname on slaves to differ from the fully qualified domain name. For example, if the FQDN for the slave is node1.foo.com, set the hostname on the node to be just node1. 2) On submitting a job, observe that the AM submits resource requests using the FQDN (e.g. node1.foo.com). You can add logging to the allocate() method of whatever scheduler you're using for (ResourceRequest req: ask) { LOG.debug(String.format(Request %s for %d containers on %s, req, req.getNumContainers(), req.getHostName())); } 3) Observe that when the scheduler checks for node locality (in the handle() method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is the one set in the host OS (e.g. node1). NOTE: if you're using FifoScheduler, this bug needs to be fixed first (https://issues.apache.org/jira/browse/YARN-412). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-449) HBase test failures when running against Hadoop 2
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600632#comment-13600632 ] Ted Yu commented on YARN-449: - More from TEST-org.apache.hadoop.hbase.mapreduce.TestRowCounter.xml: {code} 2013-03-12 18:53:39,274 WARN [Container Monitor] monitor.ContainersMonitorImpl(298): Process tree for container: container_1363114400920_0001_01_01 has processes older than 1 iteration running over the configured limit. Limit=4509715456, current usage = 7007866880 2013-03-12 18:53:39,275 WARN [Container Monitor] monitor.ContainersMonitorImpl$MonitoringThread(444): Container [pid=8438,containerID=container_1363114400920_0001_01_01] is running beyond virtual memory limits. Current usage: 217.9 MB of 2 GB physical memory used; 6.5 GB of 4.2 GB virtual memory used. Killing container. Dump of the process-tree for container_1363114400920_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 8438 7023 8438 8438 (bash) 1 0 108650496 310 /bin/bash -c /usr/lib/jvm/java-1.6.0-sun-1.6.0.37.x86_64/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/homes/hortonzy/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1gt;/homes/hortonzy/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01/stdout 2gt;/homes/hortonzy/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01/stderr |- 8461 8438 8438 8438 (java) 688 34 6899216384 55478 /usr/lib/jvm/java-1.6.0-sun-1.6.0.37.x86_64/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/homes/hortonzy/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster_1035429065/org.apache.hadoop.mapred.MiniMRCluster_1035429065-logDir-nm-1_2/application_1363114400920_0001/container_1363114400920_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster {code} Note that 1024m was specified for -Xmx HBase test failures when running against Hadoop 2 - Key: YARN-449 URL: https://issues.apache.org/jira/browse/YARN-449 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Priority: Blocker Attachments: 7904-v5.txt, hbase-7904-v3.txt, hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, minimr_randomdir-branch2.txt Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-449) HBase test failures when running against Hadoop 2
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600650#comment-13600650 ] Ted Yu commented on YARN-449: - Here is sample content for /proc/PID/stat {code} 30873 (sshd) S 30869 30869 30869 0 -1 4202816 360 0 0 0 47 56 0 0 20 0 1 0 741791881 117960704 516 18446744073709551615 1 1 0 0 0 0 0 4096 65536 18446744073709551615 0 0 17 2 0 0 0 0 0 {code} Here is the regex used to parse the stat file: {code} private static final Pattern PROCFS_STAT_FILE_FORMAT = Pattern .compile( ^([0-9-]+)\\s([^\\s]+)\\s[^\\s]\\s([0-9-]+)\\s([0-9-]+)\\s([0-9-]+)\\s + ([0-9-]+\\s){7}([0-9]+)\\s([0-9]+)\\s([0-9-]+\\s){7}([0-9]+)\\s([0-9]+) + (\\s[0-9-]+){15}); {code} HBase test failures when running against Hadoop 2 - Key: YARN-449 URL: https://issues.apache.org/jira/browse/YARN-449 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Priority: Blocker Attachments: 7904-v5.txt, hbase-7904-v3.txt, hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, minimr_randomdir-branch2.txt Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-470) Support a way to disable resource monitoring on the NodeManager
Hitesh Shah created YARN-470: Summary: Support a way to disable resource monitoring on the NodeManager Key: YARN-470 URL: https://issues.apache.org/jira/browse/YARN-470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ). We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-471) RM does not validate the resource capability of an NM when the NM registers with the RM
Hitesh Shah created YARN-471: Summary: RM does not validate the resource capability of an NM when the NM registers with the RM Key: YARN-471 URL: https://issues.apache.org/jira/browse/YARN-471 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Today, an NM could register with -1 memory and -1 cpu with the RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-472) Fix MR Job falied if RM restarted when the job is running
jian he created YARN-472: Summary: Fix MR Job falied if RM restarted when the job is running Key: YARN-472 URL: https://issues.apache.org/jira/browse/YARN-472 Project: Hadoop YARN Issue Type: Sub-task Reporter: jian he Assignee: jian he If the RM is restarted when the MR job is running , the job failed because the staging directory is cleaned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-472) MR Job falied if RM restarted when the job is running
[ https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-472: - Summary: MR Job falied if RM restarted when the job is running (was: Fix MR Job falied if RM restarted when the job is running) MR Job falied if RM restarted when the job is running - Key: YARN-472 URL: https://issues.apache.org/jira/browse/YARN-472 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he Assignee: jian he If the RM is restarted when the MR job is running , the job failed because the staging directory is cleaned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-466) Slave hostname mismatches in ResourceManager/Scheduler
[ https://issues.apache.org/jira/browse/YARN-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600772#comment-13600772 ] Roger Hoover commented on YARN-466: --- My guess is that the best way to solve this is to change the NodeManager to send the [fully qualified domain name|http://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getCanonicalHostName()] to the ResourceManager when it registers itself. Slave hostname mismatches in ResourceManager/Scheduler -- Key: YARN-466 URL: https://issues.apache.org/jira/browse/YARN-466 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Roger Hoover Assignee: Zhijie Shen The problem is that the ResourceManager learns the hostname of a slave node when the NodeManager registers itself and it seems the node manager is getting the hostname by asking the OS. When a job is submitted, I think the ApplicationMaster learns the hostname by doing a reverse DNS lookup based on the slaves file. Therefore, the ApplicationMaster submits requests for containers using the fully qualified domain name (node1.foo.com) but the scheduler uses the OS hostname (node1) when checking to see if any requests are node-local. The result is that node-local requests are never found using this method of searching for node-local requests: ResourceRequest request = application.getResourceRequest(priority, node.getHostName()); I think it's unfriendly to ask users to make sure they configure hostnames to match fully qualified domain names. There should be a way for the ApplicationMaster and NodeManager to agree on the hostname. Steps to Reproduce: 1) Configure the OS hostname on slaves to differ from the fully qualified domain name. For example, if the FQDN for the slave is node1.foo.com, set the hostname on the node to be just node1. 2) On submitting a job, observe that the AM submits resource requests using the FQDN (e.g. node1.foo.com). You can add logging to the allocate() method of whatever scheduler you're using for (ResourceRequest req: ask) { LOG.debug(String.format(Request %s for %d containers on %s, req, req.getNumContainers(), req.getHostName())); } 3) Observe that when the scheduler checks for node locality (in the handle() method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is the one set in the host OS (e.g. node1). NOTE: if you're using FifoScheduler, this bug needs to be fixed first (https://issues.apache.org/jira/browse/YARN-412). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-471) NM does not validate the resource capabilities before it registers with RM
[ https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-471: - Component/s: (was: resourcemanager) nodemanager Summary: NM does not validate the resource capabilities before it registers with RM (was: RM does not validate the resource capability of an NM when the NM registers with the RM) Because NM and RM both are the trusted components in the system, I think we should do this validation on the NM itself. Modifying the description, please revert it if you disagree. NM does not validate the resource capabilities before it registers with RM -- Key: YARN-471 URL: https://issues.apache.org/jira/browse/YARN-471 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Today, an NM could register with -1 memory and -1 cpu with the RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-470) Support a way to disable resource monitoring on the NodeManager
[ https://issues.apache.org/jira/browse/YARN-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600795#comment-13600795 ] Vinod Kumar Vavilapalli commented on YARN-470: -- Dependent on how you look at it, it's massive; I take the blame :) Good find! Support a way to disable resource monitoring on the NodeManager --- Key: YARN-470 URL: https://issues.apache.org/jira/browse/YARN-470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ). We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: YARN-18-v4.1.patch Address timeout and JavaDoc issue in v4.1 patch. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600800#comment-13600800 ] Vinod Kumar Vavilapalli commented on YARN-198: -- +1, this looks good, checking it in. If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager --- Key: YARN-198 URL: https://issues.apache.org/jira/browse/YARN-198 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ramgopal N Assignee: jian he Priority: Minor Labels: usability Attachments: YARN-198.patch If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600807#comment-13600807 ] Hudson commented on YARN-198: - Integrated in Hadoop-trunk-Commit #3460 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3460/]) YARN-198. Added a link to RM pages from the NodeManager web app. Contributed by Jian He. (Revision 1455800) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager --- Key: YARN-198 URL: https://issues.apache.org/jira/browse/YARN-198 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ramgopal N Assignee: jian he Priority: Minor Labels: usability Fix For: 2.0.5-beta Attachments: YARN-198.patch If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600810#comment-13600810 ] jian he commented on YARN-198: -- Thanks, Vinod! If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager --- Key: YARN-198 URL: https://issues.apache.org/jira/browse/YARN-198 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ramgopal N Assignee: jian he Priority: Minor Labels: usability Fix For: 2.0.5-beta Attachments: YARN-198.patch If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600820#comment-13600820 ] Vinod Kumar Vavilapalli commented on YARN-378: -- bq. We should separate the YARN part of it from the mapreduce only changes. Filed MAPREDUCE-5062 : MR AM should read max-retries information from the RM. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600824#comment-13600824 ] Hadoop QA commented on YARN-18: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573471/YARN-18-v4.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/510//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/510//console This message is automatically generated. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600849#comment-13600849 ] Hadoop QA commented on YARN-18: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573476/YARN-18-v4.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/511//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/511//console This message is automatically generated. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-381) Improve FS docs
[ https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600859#comment-13600859 ] Vitaly Kruglikov commented on YARN-381: --- The file FairScheduler.apt.vm doesn't specify the units for the minResources property in queue allocations. Is it in megabytes or in bytes? Improve FS docs --- Key: YARN-381 URL: https://issues.apache.org/jira/browse/YARN-381 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Minor The MR2 FS docs could use some improvements. Configuration: - sizebasedweight - what is the size here? Total memory usage? Pool properties: - minResources - what does min amount of aggregate memory mean given that this is not a reservation? - maxResources - is this a hard limit? - weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? There's no mention of ACLs, even though they're supported. See the CS docs for comparison. Also there are a couple typos worth fixing while we're at it, eg finish. apps to run Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira