[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335409#comment-14335409 ] Hadoop QA commented on YARN-3231: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700551/YARN-3231.v1.patch against trunk revision 73bcfa9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6711//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6711//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6711//console This message is automatically generated. FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335424#comment-14335424 ] Karthik Kambatla commented on YARN-3122: Thanks for working on this, Anubhav. The overall structure looks good, but for one concern on the API. More comments below. I am yet to take a closer look at the tests. # ContainerMetrics ## Change phyCpuUsagePercent to pCpuUsagePercent for consistency with other variables? ## Also, given YARN-3022 hasn't gone into a release yet, can we update the variables introduced there to reflect units as well - e.g. pMemUsageMBs instead of pMemUsage, and pMemLimitMBs instead of pMemLimitMbs? ## Change Vcore usage stats times 1000 to 1000 times vcore usage? # ContainersMonitorImpl: Nit - can we avoid starting lines with parentheses for method arguments? I am okay with not addressing this, just a personal preference. # CpuTimeTracker ## Mark as Private-Unstable ## Nit: Can we update the comments’ location for variables for better readability? {code} public static final int UNAVAILABLE = -1; // CPU used time since system is on (in milliseconds) BigInteger cumulativeCpuTime = BigInteger.ZERO; // … {code} ## Move MINIMUM_UPDATE_INTERVAL next to UNAVAILABLE? ## Passing along the number of processors in getCpuTrackerUsage doesn’t seem right. If this is set once for CpuTracker, can we pass it through constructor? ## # ProcfsBasedProcessTree ## Would like to avoid passing numProcessors in getCpuUsagePercent ## The main method only captures the CPU usage, while the class tracks both memory and CPU. Can we move this to either a test or a util class? # NodeManagerHardwareUtils - s/thats/that is/ Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.001.patch, YARN-3122.002.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335343#comment-14335343 ] Akira AJISAKA commented on YARN-3217: - Thanks [~brahmareddy] for the update. {code} -HttpClient client = new HttpClient(params); + throws IOException, URISyntaxException { + {code} {{WebAppProxyServlet#proxyLink}} does not throw {{URISyntaxException}}, so would you remove this? Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Attachments: YARN-3217-002.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335400#comment-14335400 ] Junping Du commented on YARN-3125: -- Talked offline to Zhijie to take over this JIRA. [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335463#comment-14335463 ] Hadoop QA commented on YARN-3131: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700573/yarn_3131_v5.patch against trunk revision 9a37247. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestYarnClient Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6712//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6712//console This message is automatically generated. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3131: --- Attachment: yarn_3131_v5.patch YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1226: --- Labels: ipv6 (was: ) Inconsistent hostname leads to low data locality on IPv6 hosts -- Key: YARN-1226 URL: https://issues.apache.org/jira/browse/YARN-1226 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta Environment: Linux, IPv6 Reporter: Kaibo Zhou Labels: ipv6 When I run a mapreduce job which use TableInputFormat to scan a hbase table on yarn cluser with 140+ nodes, I consistently get very low data locality around 0~10%. The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the cluster with NodeManager, DataNode and HRegionServer run on the same node. The reason of low data locality is: most machines in the cluster uses IPV6, few machines use IPV4. NodeManager use InetAddress.getLocalHost().getHostName() to get the host name, but the return result of this function depends on IPV4 or IPV6, see [InetAddress.getLocalHost().getHostName() returns FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687]. On machines with ipv4, NodeManager get hostName as: search042097.sqa.cm4.site.net But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns search042097.sqa.cm4.site.net. For the mapred job which scan hbase table, the InputSplit contains node locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames are allocated by HMaster. HMaster communicate with RegionServers and get the region server's host name use java NIO: clientChannel.socket().getInetAddress().getHostName(). Also see the startup log of region server: 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname to use. Was=search042024.sqa.cm4, Now=search042024.sqa.cm4.site.net As you can see, most machines in the Yarn cluster with IPV6 get the short hostname, but hbase always get the full hostname, so the Host cannot matched (see RMContainerAllocator::assignToMap).This can lead to poor locality. After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data locality in the cluster. Thanks, Kaibo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335298#comment-14335298 ] Hitesh Shah commented on YARN-3239: --- Tested manually by applying this patch. Works fine with the kind of urls Tez is using. WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335300#comment-14335300 ] Hudson commented on YARN-2980: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7190 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7190/]) YARN-2980. Move health check script related functionality to hadoop-common (Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Fix For: 3.0.0 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335321#comment-14335321 ] Jian He commented on YARN-3202: --- this piece of code is legacy code only for non-work-preserving restart. The existing code path for work-preserving restart covers this already. Given that we only support work-preserving restart, I think we can get rid of all the conditional code for non-work-preserving restart and the tests may need to be changed too. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3125: - Attachment: YARN-3125.patch Based on latest timelineservice put API provided in YARN-3240 [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3125.patch We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-3125: Assignee: Junping Du (was: Zhijie Shen) [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Junping Du Attachments: YARN-3125.patch We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run
[ https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Roberts updated YARN-3084: --- Attachment: yarn-yarn-resourcemanager-sandbox.hortonworks.com.log YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run Key: YARN-3084 URL: https://issues.apache.org/jira/browse/YARN-3084 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.6.0 Environment: Using eclipse on windows 7 (client)to run the map reduce job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 6.0.2 build-1744117) Reporter: Michael Br Priority: Minor Attachments: yarn-yarn-resourcemanager-sandbox.hortonworks.com.log Hello, 1.I want to run the simple Map Reduce job example (with the REST API 2.6 for yarn applications) and to calculate PI… for now it doesn’t work. When I use the command in the hortonworks terminal it works: “hadoop jar /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar pi 10 10”. But I want to submit the job with the REST API and not in the terminal as a command line. [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application] 2.I do succeed with other REST API requests: get state, get new application id and even kill(change state), but when I try to submit my example, the response is: -- -- The Response Header: Key : null ,Value : [HTTP/1.1 202 Accepted] Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 GMT] Key : Content-Length ,Value : [0] Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 GMT] Key : Location ,Value : [http://[my port]:8088/ws/v1/cluster/apps/application_1421661392788_0038] Key : Content-Type ,Value : [application/json] Key : Server ,Value : [Jetty(6.1.26.hwx)] Key : Pragma ,Value : [no-cache, no-cache] Key : Cache-Control ,Value : [no-cache] The Respone Body: Null (No Response) -- -- 3.I need help with the http request body filling. I am doing a POST http request and I know that I am doing it right (in java). 4.I think the problem is in the request body. 5.I used this guy’s answer to help me build my map reduce example xml but it does not work: [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api]. 6.What am I missing? (the description is not clear to me in the submit section of the rest api 2.6) 7.Does someone have an xml example for using a simple MR job? 8.Thanks! Here is the XML file I am using for the request body: -- -- ?xml version=1.0 encoding=UTF-8 standalone=yes? application-submission-context application-idapplication_1421661392788_0038/application-id application-nametest_21_1/application-name queuedefault/queue priority3/priority am-container-spec environment entry keyCLASSPATH/key value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value /entry /environment commands commandhadoop jar /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar pi 10 10/command /commands /am-container-spec unmanaged-AMfalse/unmanaged-AM max-app-attempts2/max-app-attempts resource memory1024/memory
[jira] [Commented] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run
[ https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335311#comment-14335311 ] Sean Roberts commented on YARN-3084: Apologies, didn't mean to hit submit. I submitted with that job. Interestingly, the 'pi' runs and is successful but the parent job reports a failure. Application application_1424804952495_0004 failed 2 times due to AM Container for appattempt_1424804952495_0004_02 exited with exitCode: 0 Attaching resource manager logs as yarn-yarn-resourcemanager-sandbox.hortonworks.com.log YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run Key: YARN-3084 URL: https://issues.apache.org/jira/browse/YARN-3084 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.6.0 Environment: Using eclipse on windows 7 (client)to run the map reduce job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 6.0.2 build-1744117) Reporter: Michael Br Priority: Minor Attachments: yarn-yarn-resourcemanager-sandbox.hortonworks.com.log Hello, 1.I want to run the simple Map Reduce job example (with the REST API 2.6 for yarn applications) and to calculate PI… for now it doesn’t work. When I use the command in the hortonworks terminal it works: “hadoop jar /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar pi 10 10”. But I want to submit the job with the REST API and not in the terminal as a command line. [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application] 2.I do succeed with other REST API requests: get state, get new application id and even kill(change state), but when I try to submit my example, the response is: -- -- The Response Header: Key : null ,Value : [HTTP/1.1 202 Accepted] Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 GMT] Key : Content-Length ,Value : [0] Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 GMT] Key : Location ,Value : [http://[my port]:8088/ws/v1/cluster/apps/application_1421661392788_0038] Key : Content-Type ,Value : [application/json] Key : Server ,Value : [Jetty(6.1.26.hwx)] Key : Pragma ,Value : [no-cache, no-cache] Key : Cache-Control ,Value : [no-cache] The Respone Body: Null (No Response) -- -- 3.I need help with the http request body filling. I am doing a POST http request and I know that I am doing it right (in java). 4.I think the problem is in the request body. 5.I used this guy’s answer to help me build my map reduce example xml but it does not work: [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api]. 6.What am I missing? (the description is not clear to me in the submit section of the rest api 2.6) 7.Does someone have an xml example for using a simple MR job? 8.Thanks! Here is the XML file I am using for the request body: -- -- ?xml version=1.0 encoding=UTF-8 standalone=yes? application-submission-context application-idapplication_1421661392788_0038/application-id application-nametest_21_1/application-name queuedefault/queue priority3/priority am-container-spec environment entry keyCLASSPATH/key value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value /entry
[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3131: --- Attachment: yarn_3131_v6.patch YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3249: Attachment: YARN-3249.patch Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.patch It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334716#comment-14334716 ] Hadoop QA commented on YARN-3217: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700461/YARN-3217-002.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6708//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6708//console This message is automatically generated. Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Attachments: YARN-3217-002.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3249) Add the kill application to the Resource Manager Web UI
Ryu Kobayashi created YARN-3249: --- Summary: Add the kill application to the Resource Manager Web UI Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Priority: Minor It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
Varun Vasudev created YARN-3248: --- Summary: Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3249: Attachment: screenshot.png Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-3249: - Assignee: Ryu Kobayashi Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings failure for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334689#comment-14334689 ] Hadoop QA commented on YARN-3247: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700368/YARN-3247.000.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6707//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6707//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6707//console This message is automatically generated. TestQueueMappings failure for FairScheduler --- Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334789#comment-14334789 ] Hudson commented on YARN-2797: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/]) YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by (devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Fix For: 2.7.0 Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334800#comment-14334800 ] Naganarasimha G R commented on YARN-3168: - Hi [~gururaj], Thanks for uploading the patch. In the patch which you have attached seems like changes in YarnCommands.apt.vm for HADOOP-11575 is not considered, Please check . Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3248: Component/s: capacityscheduler Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334748#comment-14334748 ] Hadoop QA commented on YARN-2820: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700337/YARN-2820.004.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6709//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6709//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6709//console This message is automatically generated. Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334807#comment-14334807 ] Hudson commented on YARN-2797: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/848/]) YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by (devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Fix For: 2.7.0 Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3168: --- Attachment: YARN-3168.20150224.1.patch Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334778#comment-14334778 ] Gururaj Shetty commented on YARN-3168: -- [~aw] Attached the update patch. Please review. Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334986#comment-14334986 ] Jason Lowe commented on YARN-3251: -- Sample stack trace: {noformat} Found one Java-level deadlock: = IPC Server handler 71 on 8032: waiting to lock monitor 0x037f9120 (object 0x00023b060ad8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue), which is held by ResourceManager Event Processor ResourceManager Event Processor: waiting to lock monitor 0x02c4b7d0 (object 0x00023aecf620, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue), which is held by IPC Server handler 71 on 8032 Java stack information for the threads listed above: === IPC Server handler 71 on 8032: at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:451) - waiting to lock 0x00023b060ad8 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214) - locked 0x00023aecf620 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214) - locked 0x00023af36e70 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214) - locked 0x00023b0d9478 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:910) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:832) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:259) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:413) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2079) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2075) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2073) ResourceManager Event Processor: at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent(AbstractCSQueue.java:185) - waiting to lock 0x00023aecf620 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.getAbsoluteMaxAvailCapacity(CSQueueUtils.java:177) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.getAbsoluteMaxAvailCapacity(CSQueueUtils.java:183) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1033) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.checkLimitsToReserve(LeafQueue.java:1341) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1611) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1399) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1278) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:893) - locked 0x00023b060ad8 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:758) - locked 0x00023ceb53e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) - locked 0x00023b060ad8 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) at
[jira] [Created] (YARN-3250) Support admin cli interface in Application Priority Manager (server side)
Sunil G created YARN-3250: - Summary: Support admin cli interface in Application Priority Manager (server side) Key: YARN-3250 URL: https://issues.apache.org/jira/browse/YARN-3250 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Current Application Priority Manager supports only configuration via file. To support runtime configurations for admin cli and REST, a common management interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
Jason Lowe created YARN-3251: Summary: CapacityScheduler deadlock when computing absolute max avail capacity Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335005#comment-14335005 ] Tsuyoshi OZAWA commented on YARN-2820: -- [~zxu] Great job! We are almost there. To avoid repeating code for retry, I think it's better to have FSAction like ZKAction in ZKRMStateStore. What do you think? Minor nits: I prefer to have a line break after = for readability. {code} + public static final String FS_RM_STATE_STORE_NUM_RETRIES = RM_PREFIX + + fs.state-store.num-retries; + public static final String FS_RM_STATE_STORE_RETRY_INTERVAL_MS = RM_PREFIX + + fs.state-store.retry-interval-ms; {code} {code} public static final String FS_RM_STATE_STORE_NUM_RETRIES = RM_PREFIX + fs.state-store.num-retries; public static final String FS_RM_STATE_STORE_RETRY_INTERVAL_MS = RM_PREFIX + fs.state-store.retry-interval-ms; {code} Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334996#comment-14334996 ] Hudson commented on YARN-2797: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/]) YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by (devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/CHANGES.txt TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Fix For: 2.7.0 Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335010#comment-14335010 ] Junping Du commented on YARN-3039: -- Thanks [~zjshen] for review and comments! bq. I think so, too. RM has its own builtin aggregator, and RM directly writes through it. I have a very basic question here: didn't we want a singleton app aggregator for all app related events, logs, etc.? Ideally, only this singleton aggregator can have magic to sort out app info in aggregation. If not, we can even give up current flow NM(s) - app aggregator(deployed on one NM) - backend and let NM to talk to backend directly for saving hop for traffic. Can you clarify more on this? bq. in the heartbeat, instead of always sending the snapshot of the aggregator address info, can we send the incremental information upon any change happens to the aggregator address table. Usually, the aggregator will not change it place often, such that we can avoid unnecessary additional traffic in most heartbeats. That's a very good point for discussion. The interesting thing here is only we can compare with info from client (NM), then we can know what is alternated in server (RM) since last heartbeat. Take token update for example (populateKeys() in ResourceTrackerService), our current implementation is: we encoded master keys (ContainerTokenMasterKey and NMTokenMasterKey) known by NM in request, then in response we can filter out old keys that already known by NM. IMO, this (put everything in request, and put something/nothing in response) doesn't have any optimization against the way we put nothing in request and put everything in response, but only turn outbound traffic into inbound and bring compare logic in server side. Isn't it? Another optimization we can think here is to let client express its interested app aggregators on the request (with adding them to a new optional field, e.g. InterestedApps) when it found these info are missing or stale, and server only loop related app aggregators info in. NM can maintain an interested app aggregator list, which get updated when first time app's container get launched or app's aggregator info get stale (may reported in writer/reader's retry logic) and items from list get removed when received from heartbeat response. Thoughts? bq. One addition issue related the rm state store: calling it in the update transition may break the app recovery. The current state instead of the final state will be written into the store. If RM stops and restarts at this moment, this app can't be recovered properly. Thanks for reminding on this. This is something I am not 100% sure. However, from recoverApplication() in RMAppManager, I didn't see we cannot recover app in RUNNING or other state (except final states, like: killed, finished, etc.). Do I miss anything on this? One missing piece of code indeed here is I forget to repopulate aggregatorAddr from store in RMAppImpl.recover(), will add it back in next patch. [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335021#comment-14335021 ] Sunil G commented on YARN-3251: --- [~jlowe] Recent getAbsoluteMaxAvailCapacity changes cause this. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334975#comment-14334975 ] Hudson commented on YARN-2797: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/]) YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by (devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Fix For: 2.7.0 Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335002#comment-14335002 ] Jason Lowe commented on YARN-3251: -- It looks like this is fallout from YARN-2008. CSQueueUtils.getAbsoluteMaxAvailCapacity is called with the lock held on the LeafQueue and walks up the tree, attempting to grab locks on parents as it goes. That's contrary to the conventional order of locking while walking down the tree, and thus we can deadlock. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335009#comment-14335009 ] Jason Lowe commented on YARN-2008: -- Note that this change appears to lead to a deadlock, as getAbsoluteMaxAvailCapacity is called with a lock held on the leaf queue and then walks up the hierarchy attempting to grab parent locks as it goes. See YARN-3251. CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Craig Welch Fix For: 2.6.0 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, YARN-2008.8.patch, YARN-2008.9.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334903#comment-14334903 ] Hudson commented on YARN-2797: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/]) YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by (devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/CHANGES.txt TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Fix For: 2.7.0 Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Description: Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. was: Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334925#comment-14334925 ] Hudson commented on YARN-2797: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/]) YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by (devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Fix For: 2.7.0 Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Summary: Priority Label Manager in RM to manage application priority based on configuration (was: Priority Label Manager in RM to manage priority labels) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335044#comment-14335044 ] Jason Lowe commented on YARN-3251: -- YARN-3243 could remove the need to climb up the hierarchy to compute max avail capacity. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335043#comment-14335043 ] Sunil G commented on YARN-3251: --- Its better to compute the available capacity during the call to root.assignContainers. In that scenario, a simpler get will retrieve the available capacity. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335244#comment-14335244 ] Anubhav Dhoot commented on YARN-3202: - This seems fair to me. [~jianhe] do you see any reason handling completed master containers would interfere with work preserving recovery? Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities
[ https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335106#comment-14335106 ] Junping Du commented on YARN-3240: -- Thanks [~zjshen] for the patch! I am reviewing this patch now, and a couple of comments so far: {code} + //TODO: It needs to be updated by the discovery service private URI resURI; {code} Looks like we are creating one TimelineClient for every application so we have multiple TimelineClients within NM. Do we think about the other way - one TimelineClient can talk to different app URLs (put url as a parameter in every call, so client can be more stateless)? I don't have any preference here and I think compatibility with old client could be a good reason here. But just curious on our decisions here. In addition, I think we need a new constructor to take resURI as a parameter because this is not get from configuration now but get from caller of TimelineClient who know the resource details (address of aggregator). And a setter is also needed to resURI because when caller (AM or NMs) have any failure in PUT/POST (as IOException so far), its retry logic will notify RM to recovery (through heartbeat or allocate request, addressed in YARN-3039) and set it back afterwards. {code} catch (RuntimeException re) { + // runtime exception is expected if the client cannot connect the server + String msg = + Failed to get the response from the timeline server.; + LOG.error(msg, re); + throw new IOException(re); +} +if (resp == null || +resp.getClientResponseStatus() != ClientResponse.Status.OK) { + String msg = + Failed to get the response from the timeline server.; + LOG.error(msg); + if (LOG.isDebugEnabled() resp != null) { +String output = resp.getEntity(String.class); +LOG.debug(HTTP error code: + resp.getStatus() ++ Server response : \n + output); + } + throw new YarnException(msg); +} {code} Looks like we are differentiate 404 and 500 here with IOException and YarnException which looks fine to me. Do we plan to have different handling logic (in caller part) for two failure cases? Other looks good to me. [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3240.1.patch, YARN-3240.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335118#comment-14335118 ] Naganarasimha G R commented on YARN-3039: - Hi [~djp] Thanks for the doc which gives better understanding of the flow now . Few queries : * I feel AM should be informed of AggregatorAddr as early as register itself than currently being done in ApplicationMasterService.allocate(). * For NM's too, would it be better to update during registering itself (may be recovered during recovery, not sure though) thoughts ? * Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on YARN-3030 (Aggregators collection through NM's Aux service), PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so will AppLevelAggregatorService inform RM about the aggregator for the application? and then RM will inform NM about the appAggregatorAddr as part of heart beat response ? if this is the flow will there be chances of race condition where in before NM gets appAggregatorAddr from RM, NM might require to post some AM container Entities/events? [~zjshen], * bq. Ideally, only this singleton aggregator can have magic to sort out app info in aggregation. If not, we can even give up current flow NM(s) - app aggregator(deployed on one NM) - backend and let NM to talk to backend directly for saving hop for traffic. Can you clarify more on this? I also want some clarification on similar lines ; whats the goal in having one app one aggregator ? Is it for simple aggregation of metrics related to a application entity or any entity(flow, flow run, app specific etc...) ? If so do we require to aggregate for System entities ? May be based on this it will be more clear to get the complete picture * In one of the your's comments(not in this jira), you had mentioned that we might require to start per app aggregator only if app requests for it. In that case how will we capture container entities and its events if app does not request for per app aggregator ? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3214) Add non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3214: - Attachment: Non-exclusive-Node-Partition-Design.pdf Attached design doc, please feel free to share your ideas. Thanks! Add non-exclusive node labels -- Key: YARN-3214 URL: https://issues.apache.org/jira/browse/YARN-3214 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Non-exclusive-Node-Partition-Design.pdf Currently node labels partition the cluster to some sub-clusters so resources cannot be shared between partitioned cluster. With the current implementation of node labels we cannot use the cluster optimally and the throughput of the cluster will suffer. We are proposing adding non-exclusive node labels: 1. Labeled apps get the preference on Labeled nodes 2. If there is no ask for labeled resources we can assign those nodes to non labeled apps 3. If there is any future ask for those resources , we will preempt the non labeled apps and give them back to labeled apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335137#comment-14335137 ] Varun Saxena commented on YARN-2980: [~aw], kindly let me know if any further changes are required. Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335124#comment-14335124 ] Wangda Tan commented on YARN-3251: -- Thanks for reporting this, [~jlowe]! Since this is a blocker for 2.7, I will create a patch for this using method described in YARN-3243 first before working on other related refactorings, I added this as a sub task of YARN-3251. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-3251: Assignee: Wangda Tan CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3252) YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all applications
Eric Yang created YARN-3252: --- Summary: YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all applications Key: YARN-3252 URL: https://issues.apache.org/jira/browse/YARN-3252 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.2, 2.5.1, 2.6.0, 2.4.0, 2.3.0 Environment: Linux Reporter: Eric Yang Priority: Critical When using YARN + Slider + LinuxContainerExecutor, all slider application are running as nobody. This is because the modification in YARN-1253 to restrict all containers to run as a single user. This becomes a exploite to any application that runs inside YARN + Slider + LCE. The original behavior is more correct. The original statement indicated that users can impersonate any other users. This supposed to be only valid for proxy users, who can proxy as other users. It is designed as intended that the service user needs to be trusted by the framework to impersonate end users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3252) YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all applications
[ https://issues.apache.org/jira/browse/YARN-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335107#comment-14335107 ] Allen Wittenauer commented on YARN-3252: See YARN-2424. YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all applications --- Key: YARN-3252 URL: https://issues.apache.org/jira/browse/YARN-3252 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0, 2.4.0, 2.6.0, 2.5.1, 2.5.2 Environment: Linux Reporter: Eric Yang Priority: Critical When using YARN + Slider + LinuxContainerExecutor, all slider application are running as nobody. This is because the modification in YARN-1253 to restrict all containers to run as a single user. This becomes a exploite to any application that runs inside YARN + Slider + LCE. The original behavior is more correct. The original statement indicated that users can impersonate any other users. This supposed to be only valid for proxy users, who can proxy as other users. It is designed as intended that the service user needs to be trusted by the framework to impersonate end users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Issue Type: Sub-task (was: Bug) Parent: YARN-3243 CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile
[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3080: -- Attachment: YARN-3080.patch The DockerContainerExecutor could not write the right pid to container pidFile -- Key: YARN-3080 URL: https://issues.apache.org/jira/browse/YARN-3080 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Beckham007 Assignee: Abin Shahab Attachments: YARN-3080.patch, YARN-3080.patch The docker_container_executor_session.sh is like this: {quote} #!/usr/bin/env bash echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1421723685222_0008_01_02` /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /bin/mv -f /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid /usr/bin/docker run --rm --name container_1421723685222_0008_01_02 -e GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M --cpu-shares=1024 -v /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02 -v /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02 -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh {quote} The DockerContainerExecutor use docker inspect before docker run, so the docker inspect couldn't get the right pid for the docker, signalContainer() and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335280#comment-14335280 ] Jason Lowe commented on YARN-3239: -- Any other comments? Otherwise I will commit this tomorrow. WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run
[ https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335288#comment-14335288 ] Sean Roberts commented on YARN-3084: I ran the same but with a simplified job request: {code} { application-id:application_1424804952495_0004, application-name:seanpi2, am-container-spec: { commands: { command:hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 2 2 } }, application-type:YARN } {code} YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run Key: YARN-3084 URL: https://issues.apache.org/jira/browse/YARN-3084 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.6.0 Environment: Using eclipse on windows 7 (client)to run the map reduce job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 6.0.2 build-1744117) Reporter: Michael Br Priority: Minor Hello, 1.I want to run the simple Map Reduce job example (with the REST API 2.6 for yarn applications) and to calculate PI… for now it doesn’t work. When I use the command in the hortonworks terminal it works: “hadoop jar /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar pi 10 10”. But I want to submit the job with the REST API and not in the terminal as a command line. [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application] 2.I do succeed with other REST API requests: get state, get new application id and even kill(change state), but when I try to submit my example, the response is: -- -- The Response Header: Key : null ,Value : [HTTP/1.1 202 Accepted] Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 GMT] Key : Content-Length ,Value : [0] Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 GMT] Key : Location ,Value : [http://[my port]:8088/ws/v1/cluster/apps/application_1421661392788_0038] Key : Content-Type ,Value : [application/json] Key : Server ,Value : [Jetty(6.1.26.hwx)] Key : Pragma ,Value : [no-cache, no-cache] Key : Cache-Control ,Value : [no-cache] The Respone Body: Null (No Response) -- -- 3.I need help with the http request body filling. I am doing a POST http request and I know that I am doing it right (in java). 4.I think the problem is in the request body. 5.I used this guy’s answer to help me build my map reduce example xml but it does not work: [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api]. 6.What am I missing? (the description is not clear to me in the submit section of the rest api 2.6) 7.Does someone have an xml example for using a simple MR job? 8.Thanks! Here is the XML file I am using for the request body: -- -- ?xml version=1.0 encoding=UTF-8 standalone=yes? application-submission-context application-idapplication_1421661392788_0038/application-id application-nametest_21_1/application-name queuedefault/queue priority3/priority am-container-spec environment entry keyCLASSPATH/key value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value /entry /environment commands commandhadoop jar
[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3231: -- Attachment: YARN-3231.v2.patch FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335578#comment-14335578 ] Chang Li commented on YARN-3131: [~jianhe] Thanks for review! I have updated my patch. Could you please kindly review it again. If all is good, please kindly help commit this. Thanks. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336059#comment-14336059 ] Jian He commented on YARN-3202: --- To clarify: the ContainerRecoveredTransition in RMContainerImpl does that. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3001) RM dies because of divide by zero
[ https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336083#comment-14336083 ] Rohith commented on YARN-3001: -- bq. RM does not dies unless yarn.dispatcher.exit-on-error is set to true Ignore this. RM sets this configuration to true neverthless of configured value. In YARN-382 ResourceRequest is validated via normalization process. Normalization of resources make sure always minimum-allocation-mb for containers even if users send 0 as container memory. I verified in real cluster by sending 0 as container memory and am memory. Schedulers normalize the requests and allocates configured yarn.scheduler.minimum-allocation-mb. [~hoelog] Could you give scenario when it happened? RM dies because of divide by zero - Key: YARN-3001 URL: https://issues.apache.org/jira/browse/YARN-3001 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: hoelog Assignee: Rohith RM dies because of divide by zero exception. {code} 2014-12-31 21:27:05,022 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.ArithmeticException: / by zero at org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:745) 2014-12-31 21:27:05,023 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336090#comment-14336090 ] Brahma Reddy Battula commented on YARN-3217: Yes, you are correct.. I removed and updated the patch.. Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3255) RM and NM main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated YARN-3255: -- Attachment: YARN-3255-01.patch A simple patch, which particularly lets me run a Yarn cluster in Eclipse. RM and NM main() should support generic options --- Key: YARN-3255 URL: https://issues.apache.org/jira/browse/YARN-3255 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Attachments: YARN-3255-01.patch Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic options, like {{-conf}} and {{-fs}}. It would be good to have the ability to pass generic options in order to specify configuration files or the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3249: Attachment: YARN-3249.2.patch Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336136#comment-14336136 ] Hadoop QA commented on YARN-1809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700680/YARN-1809.11.patch against trunk revision 6cbd9f1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6723//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6723//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6723//console This message is automatically generated. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.11.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336146#comment-14336146 ] Hadoop QA commented on YARN-3249: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700684/YARN-3249.2.patch against trunk revision 6cbd9f1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6724//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6724//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6724//console This message is automatically generated. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3249: Attachment: YARN-3249.2.patch Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336062#comment-14336062 ] Rohith commented on YARN-3202: -- bq. as for work-preserving restart, master container completed event will be sent too. I agree it is sending after yarn-3194 and issue is not ocurring now. Before yarn-3194, since NMContainerStatus were not handled , RMAppAttempt always wait for container-expiry to trigger for master container in RUNNING state. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336071#comment-14336071 ] Jian He commented on YARN-3202: --- For RM work-preserving restart, even before YARN-3194, the ContainerRecoveredTransition handles this correctly. The patch will cause duplicate master container completed events sent. did I miss something ? Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336082#comment-14336082 ] Rohith commented on YARN-3202: -- I mean say RM is enabled with work-preservin-restart, but RM is not restarted. Only NM is restarted which sends recovered container status while registering.NM restart scenario was causing problem ealier if master container status was COMPLETED. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3217: --- Attachment: YARN-3217-003.patch Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336054#comment-14336054 ] Vrushali C commented on YARN-3031: -- Hi [~zjshen], Thanks for the prompt review, appreciate it! These are very good points you mention, let me add some more context around why these are coded this way right now. 1. The reasoning behind having two more apis for writing metrics and events in addition to the entity write is that, it would be good (efficient) to have the option to write a single metric or a single event. For example, say a job has many custom metrics and one particular metric is updated extremely frequently but not the others. We may want to write out only that particular metric without having to look through/write all other metrics and other information in that entity. Similarly for events. Perhaps we could do it differently that what is proposed in the patch, but the functionality of writing them individually would help in performance I believe. 2. Having a separate write and aggregator API makes them independent of the order in which the entity details and aggregation are invoked/stored and makes them independent of each other. For instance, we may choose to invoke the aggregation at a different frequency (more slower) than the regular entity writes. Hence two apis. 3. The TimelineServiceWriteResponse has two error codes presently: NO_START_TIME and IO_EXCEPTION. We can of course add in more error codes as we proceed. The reason I chose these two for now is that each flow is inherently associated with a submit timestamp (run id of that flow). In case, we don’t find that timestamp, it would be difficult to write the flow information for that run to the store - I think an error should be thrown with an error code. The other one, IO_EXCEPTION is what I thought would help accounting for write/put errors to the store- we should be able to indicate that the write did not go through. We can rename these if these names don’t sound meaningful. thanks Vrushali [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3255) RM and NM main() should support generic options
Konstantin Shvachko created YARN-3255: - Summary: RM and NM main() should support generic options Key: YARN-3255 URL: https://issues.apache.org/jira/browse/YARN-3255 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic options, like {{-conf}} and {{-fs}}. It would be good to have the ability to pass generic options in order to specify configuration files or the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335784#comment-14335784 ] Masatake Iwasaki commented on YARN-2467: Thanks, [~hitliuyi]. Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-3247: - Summary: TestQueueMappings should use CapacityScheduler explicitly (was: TestQueueMappings failure for FairScheduler) TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-2190: Attachment: YARN-2190.8.patch Upload a new patch. This patch is mostly based on the version 6. We still have a separate Windows container executor. There is no CPU and memory support for {{WindowsSecureContainerExecutor}}. We can open a separate JIRA to add CPU/memory limit support to secure Windows container executor. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335826#comment-14335826 ] Hadoop QA commented on YARN-2190: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700646/YARN-2190.8.patch against trunk revision 1a625b8. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6718//console This message is automatically generated. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1809: -- Assignee: Xuan Gong (was: Zhijie Shen) Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335836#comment-14335836 ] Xuan Gong commented on YARN-1809: - Update a patch based on the latest trunk code. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3254: Attachment: Screen Shot 2015-02-24 at 17.57.39.png Attaching a screenshot when the NodeManager's disk is almost full. HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA Attachments: Screen Shot 2015-02-24 at 17.57.39.png When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3254: Description: When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. (was: When a NodeManager's local disk get almost full, the NodeManager send a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad.) HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3254) HealthReport should include disk full information
Akira AJISAKA created YARN-3254: --- Summary: HealthReport should include disk full information Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA When a NodeManager's local disk get almost full, the NodeManager send a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities
[ https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335901#comment-14335901 ] Junping Du commented on YARN-3240: -- Will commit it tomorrow if no more comments from others. [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335897#comment-14335897 ] Hadoop QA commented on YARN-3249: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700467/screenshot.png against trunk revision 6cbd9f1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6720//console This message is automatically generated. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335779#comment-14335779 ] Yi Liu commented on YARN-2467: -- [~iwasakims], I assign the JIRA to you, and feel free to work on it. Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings failure for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335802#comment-14335802 ] Tsuyoshi OZAWA commented on YARN-3247: -- +1, committing this shortly. - the default value of RM_SCHEDULER is CapacityScheduler. However, the default value can be overridden when user has modified yarn-site.xml in a class path. Also, other test cases for CapacityScheduler configure the scheduler explicitly. We should do here also. {code} protected ResourceScheduler createScheduler() { String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER, YarnConfiguration.DEFAULT_RM_SCHEDULER); {code} TestQueueMappings failure for FairScheduler --- Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Target Version/s: 2.7.0, 2.6.1 (was: 2.7.0) CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335803#comment-14335803 ] Wangda Tan commented on YARN-3251: -- [~cwelch], Some comments, 1) Since the target of your patch is to make a quick fix for old version, it's better to create a patch in branch-2.6. and the patch you created will be committed to branch-2.6 as well. I noticed some functionalities and interfaces being used in your patch are not part of 2.6. And patch I'm working on now will remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a intermediate fix in branch-2. 2) I think CSQueueUtils.getAbsoluteMaxAvailCapacity doesn't hold child/parent's lock together, maybe we don't need to change that, could you confirm? 3) Maybe we don't need getter/setter of absoluteMaxAvailCapacity in queue, a volatile float is enough? Thanks, CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3247) TestQueueMappings failure for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-3247: - Hadoop Flags: Reviewed TestQueueMappings failure for FairScheduler --- Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335889#comment-14335889 ] Hudson commented on YARN-3247: -- FAILURE: Integrated in Hadoop-trunk-Commit #7196 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7196/]) YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java * hadoop-yarn-project/CHANGES.txt TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities
[ https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335892#comment-14335892 ] Junping Du commented on YARN-3240: -- +1. Patch looks good. [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335895#comment-14335895 ] Tsuyoshi OZAWA commented on YARN-3249: -- Submitting a patch. Let me review. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335900#comment-14335900 ] Tsuyoshi OZAWA commented on YARN-3249: -- [~ryu_kobayashi] thank you for contribution. Unfortunatelly, your changes conflict with YARN-3230. Could you rebase it? Personally, +1 for the change itself. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.patch, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335566#comment-14335566 ] Hadoop QA commented on YARN-3131: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700584/yarn_3131_v6.patch against trunk revision 9a37247. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6713//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6713//console This message is automatically generated. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3231: -- Attachment: (was: YARN-3231.v2.patch) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: YARN-3031.02.patch Attaching a revised writer interface. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2467: - Assignee: Masatake Iwasaki (was: Yi Liu) Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335785#comment-14335785 ] Vinod Kumar Vavilapalli commented on YARN-3125: --- Quick comment: I think we should try not to disturb the old code much. Let's just add two separate independent code blocks without removing the old style event-push. [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Junping Du Attachments: YARN-3125.patch We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335799#comment-14335799 ] Vinod Kumar Vavilapalli commented on YARN-3131: --- Nits: {code} +throw new YarnException(Failed to submit + applicationId + +to YARN : + appReport.getDiagnostics()); {code} You will see the output to be something like application_123456_0001to YARN - a missing space We can just simply check for failToSubmitStates? Why do we also need to check for waitingStates? YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)