[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733219#comment-13733219 ] Zhijie Shen commented on YARN-1001: --- [~srimanth.gunturi], let me reword the requirement. Giving some application types and states, Ambari wants to categorize the applications into the buckets of all combinations of these application types and states, and count the number of the applications in each bucket. For example, users want to know the number of the applications of three application types: type1, type2, and type3, and two states: state1 and state2. Assume RM has 5 applications: app1(type1, state1), app2(type2, state1), app3(type2, state1), app4(type2, state2), app5(type3, state1). The users will get the following statistics: [type1, state1]: 1 [type1, state2]: 0 [type2, state1]: 2 [type2, state2]: 1 [type3, state1]: 1 [type3, state2]: 0 Is this exactly what Ambari wants? YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733249#comment-13733249 ] Junping Du commented on YARN-1042: -- Yes. It is pretty useful in cases specified in description. With this info of affinity and anti-affinity, AM should have knowledge to translate container request to resource request and ask for RM. add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733576#comment-13733576 ] Hitesh Shah commented on YARN-353: -- bq. For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. this depends on whether RM wants to know the delete operation succeeds or not. I am not sure I understand. If the RM is trying to delete something and the node does not exist, is there a situation where the RM wants to know that the node didn't exist and fail if a non-existent node was tried to be deleted? Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-337) RM handles killed application tracking URL poorly
[ https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-337: Attachment: YARN-337.patch Patch that sets the tracking URL to the RM app page when an AM attempt is killed. Also refactored the places where this was done for FAILED attempts to better cover all the various ways an AM attempt can fail. As for the unregister attempt failure, I'm tempted to leave that as-is since there will always be races between YARN-level kill/fail and apps unregistering. As long as we point to the RM app page when something goes wrong, at least the user has something to start with to diagnose the problem rather than a bad link to nowhere. RM handles killed application tracking URL poorly - Key: YARN-337 URL: https://issues.apache.org/jira/browse/YARN-337 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Labels: usability Attachments: YARN-337.patch When the ResourceManager kills an application, it leaves the proxy URL redirecting to the original tracking URL for the application even though the ApplicationMaster is no longer there to service it. It should redirect it somewhere more useful, like the RM's web page for the application, where the user can find that the application was killed and links to the AM logs. In addition, sometimes the AM during teardown from the kill can attempt to unregister and provide an updated tracking URL, but unfortunately the RM has forgotten the AM due to the kill and refuses to process the unregistration. Instead it logs: {noformat} 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_01 {noformat} It should go ahead and process the unregistration to update the tracking URL since the application offered it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733588#comment-13733588 ] Srimanth Gunturi commented on YARN-1001: [~zjshen], yes. YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-337) RM handles killed application tracking URL poorly
[ https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733612#comment-13733612 ] Hadoop QA commented on YARN-337: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596859/YARN-337.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1675//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1675//console This message is automatically generated. RM handles killed application tracking URL poorly - Key: YARN-337 URL: https://issues.apache.org/jira/browse/YARN-337 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Labels: usability Attachments: YARN-337.patch When the ResourceManager kills an application, it leaves the proxy URL redirecting to the original tracking URL for the application even though the ApplicationMaster is no longer there to service it. It should redirect it somewhere more useful, like the RM's web page for the application, where the user can find that the application was killed and links to the AM logs. In addition, sometimes the AM during teardown from the kill can attempt to unregister and provide an updated tracking URL, but unfortunately the RM has forgotten the AM due to the kill and refuses to process the unregistration. Instead it logs: {noformat} 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_01 {noformat} It should go ahead and process the unregistration to update the tracking URL since the application offered it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733639#comment-13733639 ] Ravi Prakash commented on YARN-1036: Hi Omkar! Thanks a lot for pointing out the problem in the earlier patch. Regarding the changes you are proposing, I meant for this JIRA to simply be a backport of MAPREDUCE-4342. I wasn't able to re-open that JIRA because it has already been closed (hence I had to file this new JIRA). If you have spotted a problem with the current patch, I would welcome your suggested changes. However if you have an issue with the approach, I would request you to please pursue them in a separate JIRA as they lie outside the scope of simple backporting. Most of this code is already in trunk as is. Please let me know if this is acceptable to you. Distributed Cache gives inconsistent result if cache files get deleted from task tracker - Key: YARN-1036 URL: https://issues.apache.org/jira/browse/YARN-1036 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.9 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1033) Expose RM active/standby state to web UI and metrics
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733681#comment-13733681 ] Bikas Saha commented on YARN-1033: -- This would depend on the implementation choice for YARN-1027. If we dont start all/external-facing services in standby mode then this jira will not work. Expose RM active/standby state to web UI and metrics Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: nemon lou Assignee: nemon lou Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Cluster metrics also need this state for monitor. Standby RM web services shall refuse client request unless querying for RM state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733688#comment-13733688 ] Zhijie Shen commented on YARN-1001: --- Then, the limitation of this requirement is that the dimension of the params to categorize the applications is not free to scale. If you want add one more para, the number of buckets will be increased exponentially. Therefore, I'm afraid this proposed API cannot support as many params as getApps() can do. @Srimanth Gunturi, do you think application type and state are enough for Ambari? YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733684#comment-13733684 ] Jian He commented on YARN-353: -- bq. I am not sure I understand. If the RM is trying to delete something and the node does not exist, is there a situation where the RM wants to know that the node didn't exist and fail if a non-existent node was tried to be deleted? Agreed. We should specifically check if the node exists or not. Otherwise the ZK delete() API will throw an exception if node doesn't exist which we don't want. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733736#comment-13733736 ] Zhijie Shen commented on YARN-978: -- bq. any reason why we need final application status and the tracker url in the report? Like why we need this info for application report. It should be important to users. bq. aren't these available in the overall application report? Yes, the application report contains this info, which is extracted from the current attempt. We'd like to keep the info of all the attempts, including the failed ones. bq. what is meant to be retrieved from the attempt report as compared to the app report? We hope the users can get the as complete info of attempts as possible. [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Xuan Gong Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733760#comment-13733760 ] Karthik Kambatla commented on YARN-353: --- Looking into this now. Will hopefully have an update (patch + replies) sometime today. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1046) TestDistributedShell fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1046: --- Attachment: yarn-1046-1.patch Uploading a patch that might help. Haven't been able to validate this because I was unable to consistently reproduce the issue. Verified the test passes locally with the patch, so at least this is not a regression. TestDistributedShell fails intermittently - Key: YARN-1046 URL: https://issues.apache.org/jira/browse/YARN-1046 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1046-1.patch Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it. {noformat} 2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_01] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1046) TestDistributedShell fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733805#comment-13733805 ] Hadoop QA commented on YARN-1046: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596899/yarn-1046-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1676//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1676//console This message is automatically generated. TestDistributedShell fails intermittently - Key: YARN-1046 URL: https://issues.apache.org/jira/browse/YARN-1046 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1046-1.patch Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it. {noformat} 2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_01] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-901) Active users field in Resourcemanager scheduler UI gives negative values
[ https://issues.apache.org/jira/browse/YARN-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733831#comment-13733831 ] Jian He commented on YARN-901: -- Hi [~nishan], are you running 2.0.5-alpha ? I couldn't reproduce on 2.1.0-beta, can you give some steps to reproduce and attach the screen shot of the web UI, thanks Active users field in Resourcemanager scheduler UI gives negative values -- Key: YARN-901 URL: https://issues.apache.org/jira/browse/YARN-901 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Nishan Shetty Priority: Minor Active users field in Resourcemanager scheduler UI gives negative values on Resourcemanager restart when job is in progress -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-1045: - Assignee: Jian He Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Attachments: YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1045: -- Attachment: YARN-1045.patch uploaded a trivial patch to replace the toString() of PB as description, no test case Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733882#comment-13733882 ] Hadoop QA commented on YARN-1045: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596908/YARN-1045.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1677//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1677//console This message is automatically generated. Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Attachments: YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733934#comment-13733934 ] Siddharth Seth commented on YARN-1045: -- Thanks for taking this up Jian. Did you get a chance to run all MR and YARN unit tests locally - in case we're relying on the toString format anywhere. Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Attachments: YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734040#comment-13734040 ] Siddharth Seth commented on YARN-899: - bq. With this in mind, I think who has access should be based on a union of ACLs Agree. AMs get ACLs from the RM when they register. That could be a combined list along with the queue ACLs. It's up to the AMs to enforce these. Maybe the RM proxy could do some of this as well. The MR JobHistoryServer gets ACLs from the AM - again it's up to this to enforce them. The RM AppHistoryServer will need to do the union though. Don't have experience with JT ACLs, but it does look like that's doing a union as well. View vs Modify ACLs for queues makes sense to me. Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734135#comment-13734135 ] Steve Loughran commented on YARN-679: - I've been evolving this driven by the Hoya application; code is up online at [https://github.com/hortonworks/hoya/tree/master/src/main/java/org/apache/hadoop/yarn/service/launcher] Some observations I added an interface {{GetExceptionExitCode}} to get an exception code off any Exception. {code} public interface GetExceptionExitCode { int getExitCode(); } {code} It'd be nice to have this interface implemented by {{Shell.ExitCodeException}} and {{ExitUtil.ExitException}} so that we have a consistent way to get exit codes from any exception willing to provide them. ps. we could do with a more standardised set of error codes for YARN applications -convention rather than mandatory. To make services executable, rather than just deployable, I added another interface. {code} public interface RunService { /** * Propagate the command line arguments * * @param args argument list * @throws IOException any problem */ void setArgs(String...args) throws Exception; /** * Run a service * @return the exit code * @throws Throwable any exception to report */ int runService() throws Throwable ; } {code} Here {{setArgs}} passes down all the arguments *before {{Service.init(Config)}} is called. This lets me tune the config passed to the superclass based on the supplied arguments. {{runService()}} is called after {{Service.start()}}. The model here is that the main() thread goes # create service class # {{setArgs(...)}} # {{init(config}} # {{start()}} # {{int exit=runService()}} # {{stop}} The service doesn't need to start its own worker thread, and the exit code from runService becomes the exit code of the app. Any of the service methods are also free to throw an exception; it implements {{getExitCode()}} that becomes the exit code of the app. The code seems a bit over-complex, but it's evolved to also be the entry point for tests too add an entry point that can start any Yarn service -- Key: YARN-679 URL: https://issues.apache.org/jira/browse/YARN-679 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Steve Loughran Priority: Minor Attachments: YARN-679-001.patch There's no need to write separate .main classes for every Yarn service, given that the startup mechanism should be identical: create, init, start, wait for stopped -with an interrupt handler to trigger a clean shutdown on a control-c interrrupt. Provide one that takes any classname, and a list of config files/options -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1043) YARN Queue metrics are getting pushed to neither file nor Ganglia
[ https://issues.apache.org/jira/browse/YARN-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734152#comment-13734152 ] Hudson commented on YARN-1043: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4230 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4230/]) YARN-1043. Push all metrics consistently. Contributed by Jian He. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1512081) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java YARN Queue metrics are getting pushed to neither file nor Ganglia - Key: YARN-1043 URL: https://issues.apache.org/jira/browse/YARN-1043 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Yusaku Sako Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-1043.1.patch, YARN-1043.patch YARN Queue metrics are not getting pushed to file or Ganglia via Hadoop Metrics 2. QueueMetrics are still accessible via JMX and RM REST API (hostname:8088/ws/v1/cluster/scheduler). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-899: --- Attachment: YARN-899.2.patch Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch, YARN-899.2.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734164#comment-13734164 ] Xuan Gong commented on YARN-899: Here is my propose: We can create QueueACLsManager which encapsulate the ResourceScheduler, so whenever we need to check user's permission, we can provide the UGI, queueName(This can be found from RMApp) to the Scheduler (No matter it is CapacityScheduler, fairScheduler or FIFOScheduler), and let scheduler to help us make the decision. For each scheduler, we may need to add a new interface : Scheduler#hasAccess(UGI, queueName). The queueName is used to find the correct Queue. The reason why I send information back to Scheduler and let scheduler make the decision is because a. I think all the QueueACLsInfos are collected by the Scheduler, and save in its queues (I can not findother places which save the QueueACLs), b. even if the queue is re-initiated in the future, we do not need to worry about it. Attached is the patch implements this propose. Please give me the suggestions. Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch, YARN-899.2.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1047) Expose # of pre-emptions as a queue counter
Philip Zeyliger created YARN-1047: - Summary: Expose # of pre-emptions as a queue counter Key: YARN-1047 URL: https://issues.apache.org/jira/browse/YARN-1047 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Philip Zeyliger Since YARN supports pre-empting containers, a given queue should expose the number of containers it has had pre-empted as a metric. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-656) In scheduler UI, including reserved memory in Memory Total can make it exceed cluster capacity.
[ https://issues.apache.org/jira/browse/YARN-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734186#comment-13734186 ] Alejandro Abdelnur commented on YARN-656: - +1 In scheduler UI, including reserved memory in Memory Total can make it exceed cluster capacity. - Key: YARN-656 URL: https://issues.apache.org/jira/browse/YARN-656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-656-1.patch, YARN-656.patch Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. Including reservedMB in this sum can make the total exceed the capacity of the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1046) TestDistributedShell fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1046: --- Attachment: yarn-1046-2.patch Thanks Sandy. Totally agree with you, forgot about that JIRA. Here is a patch that is very much along the lines of MAPREDUCE-5094, but for MiniYARNCluster. I wonder if we need MAPREDUCE-5094 anymore? TestDistributedShell fails intermittently - Key: YARN-1046 URL: https://issues.apache.org/jira/browse/YARN-1046 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1046-1.patch, yarn-1046-2.patch Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it. {noformat} 2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_01] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734221#comment-13734221 ] Alejandro Abdelnur commented on YARN-1021: -- Wei, first of all, nice. I'm not convinced (even if I suggested that to you offline) on having dirs under share/hadoop/tools/sls/conf being 'configurable' and added to the classpath. Instead I would suggest the following: The stuff under share/hadoop/tools/sls/ should be samples, i.e.: sample-conf/ sample-data The runmen2sls.sh slsrunner.sh scripts should not add the sample-conf dir to the classpath, they should just add the JARs. And the documentation should state that sample-conf/ files should be copied to the hadoop conf/ directory to run the simulator. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1048) Add new AMRMClientAsync.getMatchingRequests method taking a Container as parameter
Alejandro Abdelnur created YARN-1048: Summary: Add new AMRMClientAsync.getMatchingRequests method taking a Container as parameter Key: YARN-1048 URL: https://issues.apache.org/jira/browse/YARN-1048 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur The current method signature {{getMatchingRequests(Priority priority, String resourceName, Resource resource)}} for using within {{onContainersAllocated(ListContainer containers)}} as we have to deconstruct the info from the received containers. A new signature, {{getMatchingRequests(Container container)}} would simplify usage for clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1046) TestDistributedShell fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734252#comment-13734252 ] Hadoop QA commented on YARN-1046: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596983/yarn-1046-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1678//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1678//console This message is automatically generated. TestDistributedShell fails intermittently - Key: YARN-1046 URL: https://issues.apache.org/jira/browse/YARN-1046 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1046-1.patch, yarn-1046-2.patch Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it. {noformat} 2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_01] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1046) TestDistributedShell fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734264#comment-13734264 ] Sandy Ryza commented on YARN-1046: -- +1 TestDistributedShell fails intermittently - Key: YARN-1046 URL: https://issues.apache.org/jira/browse/YARN-1046 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1046-1.patch, yarn-1046-2.patch Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it. {noformat} 2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_01] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-589) Expose a REST API for monitoring the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734267#comment-13734267 ] Sandy Ryza commented on YARN-589: - Thanks for the review, Alejandro. Committed to trunk, branch-2, and branch-2.1-beta. Expose a REST API for monitoring the fair scheduler --- Key: YARN-589 URL: https://issues.apache.org/jira/browse/YARN-589 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: fairscheduler.xml, YARN-589-1.patch, YARN-589-2.patch, YARN-589.patch The fair scheduler should have an HTTP interface that exposes information such as applications per queue, fair shares, demands, current allocations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1046) TestDistributedShell fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734265#comment-13734265 ] Sandy Ryza commented on YARN-1046: -- And yeah, MAPREDUCE-5094 may not be necessary now, though that's work for another JIRA. TestDistributedShell fails intermittently - Key: YARN-1046 URL: https://issues.apache.org/jira/browse/YARN-1046 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1046-1.patch, yarn-1046-2.patch Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it. {noformat} 2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_01] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-589) Expose a REST API for monitoring the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734283#comment-13734283 ] Hudson commented on YARN-589: - SUCCESS: Integrated in Hadoop-trunk-Commit #4233 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4233/]) Amending YARN-589. Adding missing file from patch (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1512112) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java YARN-589. Expose a REST API for monitoring the fair scheduler (Sandy Ryza). (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1512111) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java Expose a REST API for monitoring the fair scheduler --- Key: YARN-589 URL: https://issues.apache.org/jira/browse/YARN-589 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.1-beta Attachments: fairscheduler.xml, YARN-589-1.patch, YARN-589-2.patch, YARN-589.patch The fair scheduler should have an HTTP interface that exposes information such as applications per queue, fair shares, demands, current allocations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1045: -- Attachment: YARN-1045.1.patch missed 5 pb classes.. Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Attachments: YARN-1045.1.patch, YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734325#comment-13734325 ] Hadoop QA commented on YARN-1045: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597006/YARN-1045.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1679//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1679//console This message is automatically generated. Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Attachments: YARN-1045.1.patch, YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021-demo.tar.gz YARN-1021.pdf Update patch and documents according to [~tucu00]'s suggestions. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734347#comment-13734347 ] Hadoop QA commented on YARN-1021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597013/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1680//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1680//console This message is automatically generated. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira