[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335409#comment-14335409
 ] 

Hadoop QA commented on YARN-3231:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700551/YARN-3231.v1.patch
  against trunk revision 73bcfa9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6711//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6711//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6711//console

This message is automatically generated.

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335424#comment-14335424
 ] 

Karthik Kambatla commented on YARN-3122:


Thanks for working on this, Anubhav. The overall structure looks good, but for 
one concern on the API. More comments below. I am yet to take a closer look at 
the tests. 

# ContainerMetrics
## Change phyCpuUsagePercent to pCpuUsagePercent for consistency with other 
variables? 
## Also, given YARN-3022 hasn't gone into a release yet, can we update the 
variables introduced there to reflect units as well - e.g. pMemUsageMBs instead 
of pMemUsage, and pMemLimitMBs instead of pMemLimitMbs?
## Change Vcore usage stats times 1000 to 1000 times vcore usage? 
# ContainersMonitorImpl: Nit - can we avoid starting lines with parentheses for 
method arguments? I am okay with not addressing this, just a personal 
preference.
# CpuTimeTracker
## Mark as Private-Unstable 
## Nit: Can we update the comments’ location for variables for better 
readability? 
{code}
public static final int UNAVAILABLE = -1;

// CPU used time since system is on (in milliseconds)
BigInteger cumulativeCpuTime = BigInteger.ZERO;

// … 
{code}
## Move MINIMUM_UPDATE_INTERVAL next to UNAVAILABLE? 
## Passing along the number of processors in getCpuTrackerUsage doesn’t seem 
right. If this is set once for CpuTracker, can we pass it through constructor? 
## 
# ProcfsBasedProcessTree
## Would like to avoid passing numProcessors in getCpuUsagePercent
## The main method only captures the CPU usage, while the class tracks both 
memory and CPU. Can we move this to either a test or a util class? 
# NodeManagerHardwareUtils - s/thats/that is/



 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
 YARN-3122.prelim.patch, YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-24 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335343#comment-14335343
 ] 

Akira AJISAKA commented on YARN-3217:
-

Thanks [~brahmareddy] for the update.
{code}
-HttpClient client = new HttpClient(params);
+  throws IOException, URISyntaxException {
+
{code}
{{WebAppProxyServlet#proxyLink}} does not throw {{URISyntaxException}}, so 
would you remove this?

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3217-002.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335400#comment-14335400
 ] 

Junping Du commented on YARN-3125:
--

Talked offline to Zhijie to take over this JIRA.

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335463#comment-14335463
 ] 

Hadoop QA commented on YARN-3131:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700573/yarn_3131_v5.patch
  against trunk revision 9a37247.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.api.impl.TestYarnClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6712//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6712//console

This message is automatically generated.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-24 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3131:
---
Attachment: yarn_3131_v5.patch

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts

2015-02-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-1226:
---
Labels: ipv6  (was: )

 Inconsistent hostname leads to low data locality on IPv6 hosts
 --

 Key: YARN-1226
 URL: https://issues.apache.org/jira/browse/YARN-1226
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta
 Environment: Linux, IPv6
Reporter: Kaibo Zhou
  Labels: ipv6

 When I run a mapreduce job which use TableInputFormat to scan a hbase table 
 on yarn cluser with 140+ nodes, I consistently get very low data locality 
 around 0~10%. 
 The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the 
 cluster with NodeManager, DataNode and HRegionServer run on the same node.
 The reason of low data locality is: most machines in the cluster uses IPV6, 
 few machines use IPV4. NodeManager use 
 InetAddress.getLocalHost().getHostName() to get the host name, but the 
 return result of this function depends on IPV4 or IPV6, see 
 [InetAddress.getLocalHost().getHostName() returns 
 FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687]. 
 On machines with ipv4, NodeManager get hostName as: 
 search042097.sqa.cm4.site.net
 But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4
 if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns 
 search042097.sqa.cm4.site.net.
 
 For the mapred job which scan hbase table, the InputSplit contains node 
 locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. 
 search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames 
 are allocated by HMaster. HMaster communicate with RegionServers and get the 
 region server's host name use java NIO: 
 clientChannel.socket().getInetAddress().getHostName().
 Also see the startup log of region server:
 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master 
 passed us hostname to use. Was=search042024.sqa.cm4, 
 Now=search042024.sqa.cm4.site.net
 
 As you can see, most machines in the Yarn cluster with IPV6 get the short 
 hostname, but hbase always get the full hostname, so the Host cannot matched 
 (see RMContainerAllocator::assignToMap).This can lead to poor locality.
 After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data 
 locality in the cluster.
 Thanks,
 Kaibo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335298#comment-14335298
 ] 

Hitesh Shah commented on YARN-3239:
---

Tested manually by applying this patch. Works fine with the kind of urls Tez is 
using. 

 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335300#comment-14335300
 ] 

Hudson commented on YARN-2980:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7190 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7190/])
YARN-2980. Move health check script related functionality to hadoop-common 
(Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Fix For: 3.0.0

 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335321#comment-14335321
 ] 

Jian He commented on YARN-3202:
---

this piece of code is legacy code only for non-work-preserving restart.  The 
existing code path for work-preserving restart covers this already. 
Given that we only support work-preserving restart, I think we can get rid of 
all the conditional code for non-work-preserving restart and the tests may need 
to be changed too.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3125:
-
Attachment: YARN-3125.patch

Based on latest timelineservice put API provided in YARN-3240

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3125.patch


 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-3125:


Assignee: Junping Du  (was: Zhijie Shen)

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Junping Du
 Attachments: YARN-3125.patch


 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-02-24 Thread Sean Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Roberts updated YARN-3084:
---
Attachment: yarn-yarn-resourcemanager-sandbox.hortonworks.com.log

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor
 Attachments: yarn-yarn-resourcemanager-sandbox.hortonworks.com.log


 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api].
 6.What am I missing? (the description is not clear to me in the submit 
 section of the rest api 2.6)
 7.Does someone have an xml example for using a simple MR job?
 8.Thanks! Here is the XML file I am using for the request body:
 --
 --
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 application-submission-context
   application-idapplication_1421661392788_0038/application-id
 application-nametest_21_1/application-name
   queuedefault/queue
 priority3/priority
 am-container-spec  
   environment   
   entry   
   keyCLASSPATH/key
   
 value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value
   /entry
   /environment
   commands
   commandhadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10/command
   /commands
 /am-container-spec
 unmanaged-AMfalse/unmanaged-AM
 max-app-attempts2/max-app-attempts
 resource  
   memory1024/memory
   

[jira] [Commented] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-02-24 Thread Sean Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335311#comment-14335311
 ] 

Sean Roberts commented on YARN-3084:


Apologies, didn't mean to hit submit.

I submitted with that job. Interestingly, the 'pi' runs and is successful but 
the parent job reports a failure.

Application application_1424804952495_0004 failed 2 times due to AM Container 
for appattempt_1424804952495_0004_02 exited with exitCode: 0

Attaching resource manager logs as 
yarn-yarn-resourcemanager-sandbox.hortonworks.com.log

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor
 Attachments: yarn-yarn-resourcemanager-sandbox.hortonworks.com.log


 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api].
 6.What am I missing? (the description is not clear to me in the submit 
 section of the rest api 2.6)
 7.Does someone have an xml example for using a simple MR job?
 8.Thanks! Here is the XML file I am using for the request body:
 --
 --
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 application-submission-context
   application-idapplication_1421661392788_0038/application-id
 application-nametest_21_1/application-name
   queuedefault/queue
 priority3/priority
 am-container-spec  
   environment   
   entry   
   keyCLASSPATH/key
   
 value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value
   /entry
   

[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-24 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3131:
---
Attachment: yarn_3131_v6.patch

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: YARN-3249.patch

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.patch


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334716#comment-14334716
 ] 

Hadoop QA commented on YARN-3217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700461/YARN-3217-002.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6708//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6708//console

This message is automatically generated.

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3217-002.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Ryu Kobayashi (JIRA)
Ryu Kobayashi created YARN-3249:
---

 Summary: Add the kill application to the Resource Manager Web UI
 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Priority: Minor


It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-24 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-3248:
---

 Summary: Display count of nodes blacklisted by apps in the web UI
 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev


It would be really useful when debugging app performance and failure issues to 
get a count of the nodes blacklisted by individual apps displayed in the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: screenshot.png

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.patch, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-3249:
-
Assignee: Ryu Kobayashi

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.patch, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings failure for FairScheduler

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334689#comment-14334689
 ] 

Hadoop QA commented on YARN-3247:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700368/YARN-3247.000.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6707//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6707//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6707//console

This message is automatically generated.

 TestQueueMappings failure for FairScheduler
 ---

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334789#comment-14334789
 ] 

Hudson commented on YARN-2797:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/])
YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by 
(devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java


 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.7.0

 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-02-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334800#comment-14334800
 ] 

Naganarasimha G R commented on YARN-3168:
-

Hi [~gururaj],
Thanks for uploading the patch. In the patch which you have attached seems like 
changes in YarnCommands.apt.vm for HADOOP-11575 is not considered, Please check 
.

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3248:

Component/s: capacityscheduler

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev

 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334748#comment-14334748
 ] 

Hadoop QA commented on YARN-2820:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700337/YARN-2820.004.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6709//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6709//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6709//console

This message is automatically generated.

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0, 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 

[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334807#comment-14334807
 ] 

Hudson commented on YARN-2797:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/848/])
YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by 
(devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.7.0

 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown

2015-02-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3168:
---
Attachment: YARN-3168.20150224.1.patch

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-02-24 Thread Gururaj Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334778#comment-14334778
 ] 

Gururaj Shetty commented on YARN-3168:
--

[~aw] Attached the update patch. Please review.

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334986#comment-14334986
 ] 

Jason Lowe commented on YARN-3251:
--

Sample stack trace:
{noformat}
Found one Java-level deadlock:
=
IPC Server handler 71 on 8032:
  waiting to lock monitor 0x037f9120 (object 0x00023b060ad8, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue),
  which is held by ResourceManager Event Processor
ResourceManager Event Processor:
  waiting to lock monitor 0x02c4b7d0 (object 0x00023aecf620, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
  which is held by IPC Server handler 71 on 8032

Java stack information for the threads listed above:
===
IPC Server handler 71 on 8032:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:451)
- waiting to lock 0x00023b060ad8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214)
- locked 0x00023aecf620 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214)
- locked 0x00023af36e70 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214)
- locked 0x00023b0d9478 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:910)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:832)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:259)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:413)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2079)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2075)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2073)
ResourceManager Event Processor:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent(AbstractCSQueue.java:185)
- waiting to lock 0x00023aecf620 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.getAbsoluteMaxAvailCapacity(CSQueueUtils.java:177)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.getAbsoluteMaxAvailCapacity(CSQueueUtils.java:183)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1033)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.checkLimitsToReserve(LeafQueue.java:1341)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1611)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1399)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1278)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:893)
- locked 0x00023b060ad8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:758)
- locked 0x00023ceb53e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
- locked 0x00023b060ad8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 

[jira] [Created] (YARN-3250) Support admin cli interface in Application Priority Manager (server side)

2015-02-24 Thread Sunil G (JIRA)
Sunil G created YARN-3250:
-

 Summary: Support admin cli interface in Application Priority 
Manager (server side)
 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G


Current Application Priority Manager supports only configuration via file. 
To support runtime configurations for admin cli and REST, a common management 
interface has to be added which can be shared with NodeLabelsManager. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3251:


 Summary: CapacityScheduler deadlock when computing absolute max 
avail capacity
 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker


The ResourceManager can deadlock in the CapacityScheduler when computing the 
absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335005#comment-14335005
 ] 

Tsuyoshi OZAWA commented on YARN-2820:
--

[~zxu] Great job! We are almost there. To avoid repeating code for retry, I 
think it's better to have FSAction like ZKAction in ZKRMStateStore. What do you 
think?

Minor nits: I prefer to have a line break after = for readability.
{code}
+  public static final String FS_RM_STATE_STORE_NUM_RETRIES = RM_PREFIX
+  + fs.state-store.num-retries;
+  public static final String FS_RM_STATE_STORE_RETRY_INTERVAL_MS = RM_PREFIX
+  + fs.state-store.retry-interval-ms;
{code}

{code}
  public static final String FS_RM_STATE_STORE_NUM_RETRIES = 
  RM_PREFIX + fs.state-store.num-retries;
  public static final String FS_RM_STATE_STORE_RETRY_INTERVAL_MS =
  RM_PREFIX + fs.state-store.retry-interval-ms;
{code}

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0, 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 

[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334996#comment-14334996
 ] 

Hudson commented on YARN-2797:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/])
YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by 
(devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* hadoop-yarn-project/CHANGES.txt


 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.7.0

 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335010#comment-14335010
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~zjshen] for review and comments!
bq. I think so, too. RM has its own builtin aggregator, and RM directly writes 
through it.
I have a very basic question here: didn't we want a singleton app aggregator 
for all app related events, logs, etc.? Ideally, only this singleton aggregator 
can have magic to sort out app info in aggregation. If not, we can even give up 
current flow NM(s) - app aggregator(deployed on one NM) - backend and let 
NM to talk to backend directly for saving hop for traffic. Can you clarify more 
on this?

bq.  in the heartbeat, instead of always sending the snapshot of the aggregator 
address info, can we send the incremental information upon any change happens 
to the aggregator address table. Usually, the aggregator will not change it 
place often, such that we can avoid unnecessary additional traffic in most 
heartbeats.
That's a very good point for discussion. 
The interesting thing here is only we can compare with info from client (NM), 
then we can know what is alternated in server (RM) since last heartbeat. Take 
token update for example (populateKeys() in ResourceTrackerService), our 
current implementation is: we encoded master keys (ContainerTokenMasterKey and 
NMTokenMasterKey) known by NM in request, then in response we can filter out 
old keys that already known by NM. IMO, this (put everything in request, and 
put something/nothing in response) doesn't have any optimization against the 
way we put nothing in request and put everything in response, but only turn 
outbound traffic into inbound and bring compare logic in server side. Isn't it? 
Another optimization we can think here is to let client express its interested 
app aggregators on the request (with adding them to a new optional field, e.g. 
InterestedApps) when it found these info are missing or stale, and server only 
loop related app aggregators info in. NM can maintain an interested app 
aggregator list, which get updated when first time app's container get launched 
or app's aggregator info get stale (may reported in writer/reader's retry 
logic) and items from list get removed when received from heartbeat response. 
Thoughts?

bq. One addition issue related the rm state store: calling it in the update 
transition may break the app recovery. The current state instead of the final 
state will be written into the store. If RM stops and restarts at this moment, 
this app can't be recovered properly.
Thanks for reminding on this. This is something I am not 100% sure. However, 
from recoverApplication() in RMAppManager, I didn't see we cannot recover app 
in RUNNING or other state (except final states, like: killed, finished, etc.). 
Do I miss anything on this? One missing piece of code indeed here is I forget 
to repopulate aggregatorAddr from store in RMAppImpl.recover(), will add it 
back in next patch.


 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335021#comment-14335021
 ] 

Sunil G commented on YARN-3251:
---

[~jlowe]  
Recent getAbsoluteMaxAvailCapacity changes cause this.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334975#comment-14334975
 ] 

Hudson commented on YARN-2797:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/])
YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by 
(devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java


 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.7.0

 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335002#comment-14335002
 ] 

Jason Lowe commented on YARN-3251:
--

It looks like this is fallout from YARN-2008.  
CSQueueUtils.getAbsoluteMaxAvailCapacity is called with the lock held on the 
LeafQueue and walks up the tree, attempting to grab locks on parents as it 
goes.  That's contrary to the conventional order of locking while walking down 
the tree, and thus we can deadlock.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335009#comment-14335009
 ] 

Jason Lowe commented on YARN-2008:
--

Note that this change appears to lead to a deadlock, as 
getAbsoluteMaxAvailCapacity is called with a lock held on the leaf queue and 
then walks up the hierarchy attempting to grab parent locks as it goes.  See 
YARN-3251.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch, YARN-2008.9.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334903#comment-14334903
 ] 

Hudson commented on YARN-2797:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/])
YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by 
(devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* hadoop-yarn-project/CHANGES.txt


 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.7.0

 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-02-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Description: 
Focus of this JIRA is to have a centralized service to handle priority labels.

Support operations such as
* Add/Delete priority label to a specified queue
* Manage integer mapping associated with each priority label
* Support managing default priority label of a given queue
* Expose interface to RM to validate priority label

TO have simplified interface, Priority Manager will support only configuration 
file in contrast with admin cli and REST. 


  was:
Focus of this JIRA is to have a centralized service to handle priority labels.

Support operations such as
* Add/Delete priority label to a specified queue
* Manage integer mapping associated with each priority label
* Support managing default priority label of a given queue
* ACL support in queue level for priority label
* Expose interface to RM to validate priority label

Storage for this labels will be done in FileSystem and in Memory similar to 
NodeLabel

* FileSystem Based : persistent across RM restart
* Memory Based: non-persistent across RM restart



 Priority Label Manager in RM to manage application priority based on 
 configuration
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * Expose interface to RM to validate priority label
 TO have simplified interface, Priority Manager will support only 
 configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334925#comment-14334925
 ] 

Hudson commented on YARN-2797:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/])
YARN-2797. Add -help to yarn logs and nodes CLI command. Contributed by 
(devaraj: rev b610c68d4423a5a1ab342dc490cd0064f8983c07)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java


 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Fix For: 2.7.0

 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-02-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Summary: Priority Label Manager in RM to manage application priority based 
on configuration  (was: Priority Label Manager in RM to manage priority labels)

 Priority Label Manager in RM to manage application priority based on 
 configuration
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * ACL support in queue level for priority label
 * Expose interface to RM to validate priority label
 Storage for this labels will be done in FileSystem and in Memory similar to 
 NodeLabel
 * FileSystem Based : persistent across RM restart
 * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335044#comment-14335044
 ] 

Jason Lowe commented on YARN-3251:
--

YARN-3243 could remove the need to climb up the hierarchy to compute max avail 
capacity.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335043#comment-14335043
 ] 

Sunil G commented on YARN-3251:
---

Its better to compute the available capacity during the call to 
root.assignContainers. In that scenario, a simpler get will retrieve the 
available capacity. 

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335244#comment-14335244
 ] 

Anubhav Dhoot commented on YARN-3202:
-

This seems fair to me. [~jianhe] do you see any reason handling completed 
master containers would interfere with work preserving recovery?

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335106#comment-14335106
 ] 

Junping Du commented on YARN-3240:
--

Thanks [~zjshen] for the patch! I am reviewing this patch now, and a couple of 
comments so far:
{code}
+  //TODO: It needs to be updated by the discovery service
   private URI resURI;
{code}
Looks like we are creating one TimelineClient for every application so we have 
multiple TimelineClients within NM. Do we think about the other way - one 
TimelineClient can talk to different app URLs (put url as a parameter in every 
call, so client can be more stateless)? I don't have any preference here and I 
think compatibility with old client could be a good reason here. But just 
curious on our decisions here.

In addition, I think we need a new constructor to take resURI as a parameter 
because this is not get from configuration now but get from caller of 
TimelineClient who know the resource details (address of aggregator). 
And a setter is also needed to resURI because when caller (AM or NMs) have any 
failure in PUT/POST (as IOException so far), its retry logic will notify RM to 
recovery (through heartbeat or allocate request, addressed in YARN-3039) and 
set it back afterwards.

{code}
catch (RuntimeException re) {
+  // runtime exception is expected if the client cannot connect the server
+  String msg =
+  Failed to get the response from the timeline server.;
+  LOG.error(msg, re);
+  throw new IOException(re);
+}
+if (resp == null ||
+resp.getClientResponseStatus() != ClientResponse.Status.OK) {
+  String msg =
+  Failed to get the response from the timeline server.;
+  LOG.error(msg);
+  if (LOG.isDebugEnabled()  resp != null) {
+String output = resp.getEntity(String.class);
+LOG.debug(HTTP error code:  + resp.getStatus()
++  Server response : \n + output);
+  }
+  throw new YarnException(msg);
+}
{code}
Looks like we are differentiate 404 and 500 here with IOException and 
YarnException which looks fine to me. Do we plan to have different handling 
logic (in caller part) for two failure cases?

Other looks good to me.

 [Data Mode] Implement client API to put generic entities
 

 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3240.1.patch, YARN-3240.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335118#comment-14335118
 ] 

Naganarasimha G R commented on YARN-3039:
-

Hi [~djp]
Thanks for the doc which gives better understanding of the flow now .
Few queries :  
* I feel AM should be informed of AggregatorAddr as early as register itself 
than currently being done in ApplicationMasterService.allocate().
* For NM's too, would it be better to update during registering itself (may be 
recovered during recovery, not sure though) thoughts ?
* Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on 
YARN-3030 (Aggregators collection through NM's Aux service), 
PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so 
will AppLevelAggregatorService inform RM about the aggregator for the 
application? and then RM will inform NM about the appAggregatorAddr as part of 
heart beat response ? if this is the flow will there be chances of race 
condition where in before NM gets appAggregatorAddr from RM, NM might require 
to post some AM container Entities/events?

[~zjshen], 
*  bq. Ideally, only this singleton aggregator can have magic to sort out app 
info in aggregation. If not, we can even give up current flow NM(s) - app 
aggregator(deployed on one NM) - backend and let NM to talk to backend 
directly for saving hop for traffic. Can you clarify more on this?
I also want some clarification on similar lines ; whats the goal in having one 
app one aggregator ? Is it for simple aggregation of metrics related to a 
application entity or any entity(flow, flow run, app specific etc...) ? If so 
do we require to aggregate for System entities ? May be based on this it will 
be more clear to get the complete picture
* In one of the your's comments(not in this jira), you had mentioned that we 
might require to start per app aggregator only if app requests for it. In that 
case how will we capture container entities and its events if app does not 
request for per app aggregator ?

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3214) Add non-exclusive node labels

2015-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3214:
-
Attachment: Non-exclusive-Node-Partition-Design.pdf

Attached design doc, please feel free to share your ideas. Thanks!

 Add non-exclusive node labels 
 --

 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: Non-exclusive-Node-Partition-Design.pdf


 Currently node labels partition the cluster to some sub-clusters so resources 
 cannot be shared between partitioned cluster. 
 With the current implementation of node labels we cannot use the cluster 
 optimally and the throughput of the cluster will suffer.
 We are proposing adding non-exclusive node labels:
 1. Labeled apps get the preference on Labeled nodes 
 2. If there is no ask for labeled resources we can assign those nodes to non 
 labeled apps
 3. If there is any future ask for those resources , we will preempt the non 
 labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335137#comment-14335137
 ] 

Varun Saxena commented on YARN-2980:


[~aw], kindly let me know if any further changes are required.

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335124#comment-14335124
 ] 

Wangda Tan commented on YARN-3251:
--

Thanks for reporting this, [~jlowe]!

Since this is a blocker for 2.7, I will create a patch for this using method 
described in YARN-3243 first before working on other related refactorings, I 
added this as a sub task of YARN-3251.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-3251:


Assignee: Wangda Tan

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3252) YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all applications

2015-02-24 Thread Eric Yang (JIRA)
Eric Yang created YARN-3252:
---

 Summary: YARN LinuxContainerExecutor runs as nobody in Simple 
Security mode for all applications
 Key: YARN-3252
 URL: https://issues.apache.org/jira/browse/YARN-3252
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.2, 2.5.1, 2.6.0, 2.4.0, 2.3.0
 Environment: Linux
Reporter: Eric Yang
Priority: Critical


When using YARN + Slider + LinuxContainerExecutor, all slider application are 
running as nobody.  This is because the modification in YARN-1253 to restrict 
all containers to run as a single user.  This becomes a exploite to any 
application that runs inside YARN + Slider + LCE.  The original behavior is 
more correct.  The original statement indicated that users can impersonate any 
other users.  This supposed to be only valid for proxy users, who can proxy as 
other users.  It is designed as intended that the service user needs to be 
trusted by the framework to impersonate end users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3252) YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all applications

2015-02-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335107#comment-14335107
 ] 

Allen Wittenauer commented on YARN-3252:


See YARN-2424.

 YARN LinuxContainerExecutor runs as nobody in Simple Security mode for all 
 applications
 ---

 Key: YARN-3252
 URL: https://issues.apache.org/jira/browse/YARN-3252
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0, 2.6.0, 2.5.1, 2.5.2
 Environment: Linux
Reporter: Eric Yang
Priority: Critical

 When using YARN + Slider + LinuxContainerExecutor, all slider application are 
 running as nobody.  This is because the modification in YARN-1253 to restrict 
 all containers to run as a single user.  This becomes a exploite to any 
 application that runs inside YARN + Slider + LCE.  The original behavior is 
 more correct.  The original statement indicated that users can impersonate 
 any other users.  This supposed to be only valid for proxy users, who can 
 proxy as other users.  It is designed as intended that the service user needs 
 to be trusted by the framework to impersonate end users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-3243

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-02-24 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3080:
--
Attachment: YARN-3080.patch

 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335280#comment-14335280
 ] 

Jason Lowe commented on YARN-3239:
--

Any other comments?  Otherwise I will commit this tomorrow.

 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-02-24 Thread Sean Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335288#comment-14335288
 ] 

Sean Roberts commented on YARN-3084:


I ran the same but with a simplified job request:
{code}
{
  application-id:application_1424804952495_0004,
  application-name:seanpi2,
  am-container-spec:
  {
commands:
{
  command:hadoop jar 
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 2 2
}
  },
  application-type:YARN
}
{code}

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor

 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api].
 6.What am I missing? (the description is not clear to me in the submit 
 section of the rest api 2.6)
 7.Does someone have an xml example for using a simple MR job?
 8.Thanks! Here is the XML file I am using for the request body:
 --
 --
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 application-submission-context
   application-idapplication_1421661392788_0038/application-id
 application-nametest_21_1/application-name
   queuedefault/queue
 priority3/priority
 am-container-spec  
   environment   
   entry   
   keyCLASSPATH/key
   
 value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value
   /entry
   /environment
   commands
   commandhadoop jar 
 

[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-24 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Attachment: YARN-3231.v2.patch

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-24 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335578#comment-14335578
 ] 

Chang Li commented on YARN-3131:


[~jianhe] Thanks for review! I have updated my patch. Could you please kindly 
review it again. If all is good, please kindly help commit this. Thanks.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336059#comment-14336059
 ] 

Jian He commented on YARN-3202:
---

To clarify: the ContainerRecoveredTransition in RMContainerImpl does that. 

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336083#comment-14336083
 ] 

Rohith commented on YARN-3001:
--

bq. RM does not dies unless yarn.dispatcher.exit-on-error is set to true
Ignore this. RM sets this configuration to true neverthless of configured value.

In YARN-382 ResourceRequest is validated via normalization process. 
Normalization of resources make sure always minimum-allocation-mb for 
containers even if users send 0 as container memory.
I verified in real cluster by sending 0 as container memory and am memory. 
Schedulers normalize the requests and allocates configured 
yarn.scheduler.minimum-allocation-mb.

[~hoelog] Could you give scenario when it happened? 

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-24 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336090#comment-14336090
 ] 

Brahma Reddy Battula commented on YARN-3217:


Yes, you are correct.. I removed and updated the patch..

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM and NM main() should support generic options

2015-02-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3255:
--
Attachment: YARN-3255-01.patch

A simple patch, which particularly lets me run a Yarn cluster in Eclipse.

 RM and NM main() should support generic options
 ---

 Key: YARN-3255
 URL: https://issues.apache.org/jira/browse/YARN-3255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
 Attachments: YARN-3255-01.patch


 Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
 generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
 ability to pass generic options in order to specify configuration files or 
 the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: YARN-3249.2.patch

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.patch, 
 screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336136#comment-14336136
 ] 

Hadoop QA commented on YARN-1809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700680/YARN-1809.11.patch
  against trunk revision 6cbd9f1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6723//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6723//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6723//console

This message is automatically generated.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.11.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, 
 YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, 
 YARN-1809.8.patch, YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336146#comment-14336146
 ] 

Hadoop QA commented on YARN-3249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700684/YARN-3249.2.patch
  against trunk revision 6cbd9f1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6724//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6724//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6724//console

This message is automatically generated.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.patch, 
 screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: YARN-3249.2.patch

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.patch, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336062#comment-14336062
 ] 

Rohith commented on YARN-3202:
--

bq. as for work-preserving restart, master container completed event will be 
sent too.
I agree it is sending after yarn-3194 and issue is not ocurring now. Before 
yarn-3194, since NMContainerStatus were not handled , RMAppAttempt always wait 
for container-expiry to trigger for master container in RUNNING state. 

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336071#comment-14336071
 ] 

Jian He commented on YARN-3202:
---

For RM work-preserving restart, even before YARN-3194, the 
ContainerRecoveredTransition handles this correctly.  The patch will cause 
duplicate master container completed events sent. did I miss something ?

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336082#comment-14336082
 ] 

Rohith commented on YARN-3202:
--

I mean say RM is enabled with work-preservin-restart, but RM is not restarted. 
Only NM is restarted which sends recovered container status while 
registering.NM restart scenario was causing problem ealier if master container 
status was COMPLETED.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-24 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3217:
---
Attachment: YARN-3217-003.patch

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-24 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336054#comment-14336054
 ] 

Vrushali C commented on YARN-3031:
--

Hi [~zjshen],

Thanks for the prompt review, appreciate it! These are very good points you 
mention, let me add some more context around why these are coded this way right 
now.

1. The reasoning behind having two more apis for writing metrics and events in 
addition to the entity write is that, it would be good (efficient) to have the 
option to write a single metric or a single event. For example, say a job has 
many custom metrics and one particular metric is updated extremely frequently 
but not the others. We may want to write out only that particular metric 
without having to look through/write all other metrics and other information in 
that entity. Similarly for events. Perhaps we could do it differently that what 
is proposed in the patch, but the functionality of writing them individually 
would help in performance I believe. 

2. Having a separate write and aggregator API makes them independent of the 
order in which the entity details and aggregation are invoked/stored and makes 
them independent of each other. For instance, we may choose to invoke the 
aggregation at a different frequency (more slower) than the regular entity 
writes. Hence two apis. 

3. The TimelineServiceWriteResponse has two error codes presently: 
NO_START_TIME and  IO_EXCEPTION. We can of course add in more error codes as we 
proceed.  The reason I chose these two for now is that each flow is inherently 
associated with a submit timestamp (run id of that flow). In case, we don’t 
find that timestamp, it would be difficult to write the flow information for 
that run to the store - I think an error should be thrown with an error code. 
The other one, IO_EXCEPTION is what I thought would help accounting for 
write/put errors to the store- we should be able to indicate that the write did 
not go through. We can rename these if these names don’t sound meaningful. 

thanks
Vrushali

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3255) RM and NM main() should support generic options

2015-02-24 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created YARN-3255:
-

 Summary: RM and NM main() should support generic options
 Key: YARN-3255
 URL: https://issues.apache.org/jira/browse/YARN-3255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko


Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic 
options, like {{-conf}} and {{-fs}}. It would be good to have the ability to 
pass generic options in order to specify configuration files or the NameNode 
location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2467) Add SpanReceiverHost to YARN daemons

2015-02-24 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335784#comment-14335784
 ] 

Masatake Iwasaki commented on YARN-2467:


Thanks, [~hitliuyi].

 Add SpanReceiverHost to YARN daemons 
 -

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-3247:
-
Summary: TestQueueMappings should use CapacityScheduler explicitly  (was: 
TestQueueMappings failure for FairScheduler)

 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-02-24 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Attachment: YARN-2190.8.patch

Upload a new patch. This patch is mostly based on the version 6. We still have 
a separate Windows container executor. There is no CPU and memory support for 
{{WindowsSecureContainerExecutor}}. We can open a separate JIRA to add 
CPU/memory limit support to secure Windows container executor.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
 YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335826#comment-14335826
 ] 

Hadoop QA commented on YARN-2190:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700646/YARN-2190.8.patch
  against trunk revision 1a625b8.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6718//console

This message is automatically generated.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
 YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1809:
--
Assignee: Xuan Gong  (was: Zhijie Shen)

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, 
 YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, 
 YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335836#comment-14335836
 ] 

Xuan Gong commented on YARN-1809:
-

Update a patch based on the latest trunk code.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, 
 YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, 
 YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3254) HealthReport should include disk full information

2015-02-24 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3254:

Attachment: Screen Shot 2015-02-24 at 17.57.39.png

Attaching a screenshot when the NodeManager's disk is almost full.

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
 Attachments: Screen Shot 2015-02-24 at 17.57.39.png


 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3254) HealthReport should include disk full information

2015-02-24 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3254:

Description: When a NodeManager's local disk gets almost full, the 
NodeManager sends a health report to ResourceManager that local/log dir is 
bad and the message is displayed on ResourceManager Web UI. It's difficult for 
users to detect why the dir is bad.  (was: When a NodeManager's local disk get 
almost full, the NodeManager send a health report to ResourceManager that 
local/log dir is bad and the message is displayed on ResourceManager Web UI. 
It's difficult for users to detect why the dir is bad.)

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA

 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3254) HealthReport should include disk full information

2015-02-24 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-3254:
---

 Summary: HealthReport should include disk full information
 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA


When a NodeManager's local disk get almost full, the NodeManager send a health 
report to ResourceManager that local/log dir is bad and the message is 
displayed on ResourceManager Web UI. It's difficult for users to detect why the 
dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335901#comment-14335901
 ] 

Junping Du commented on YARN-3240:
--

Will commit it tomorrow if no more comments from others.

 [Data Mode] Implement client API to put generic entities
 

 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335897#comment-14335897
 ] 

Hadoop QA commented on YARN-3249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700467/screenshot.png
  against trunk revision 6cbd9f1.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6720//console

This message is automatically generated.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.patch, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2467) Add SpanReceiverHost to YARN daemons

2015-02-24 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335779#comment-14335779
 ] 

Yi Liu commented on YARN-2467:
--

[~iwasakims], I assign the JIRA to you, and feel free to work on it.

 Add SpanReceiverHost to YARN daemons 
 -

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings failure for FairScheduler

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335802#comment-14335802
 ] 

Tsuyoshi OZAWA commented on YARN-3247:
--

+1, committing this shortly. - the default value of RM_SCHEDULER is 
CapacityScheduler. However, the default value can be overridden when user has 
modified yarn-site.xml in a class path. Also, other test cases for 
CapacityScheduler configure the scheduler explicitly. We should do here also.

{code}
  protected ResourceScheduler createScheduler() {
String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER,
YarnConfiguration.DEFAULT_RM_SCHEDULER);
{code}

 TestQueueMappings failure for FairScheduler
 ---

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Target Version/s: 2.7.0, 2.6.1  (was: 2.7.0)

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335803#comment-14335803
 ] 

Wangda Tan commented on YARN-3251:
--

[~cwelch],
Some comments,
1) Since the target of your patch is to make a quick fix for old version, it's 
better to create a patch in branch-2.6. and the patch you created will be 
committed to branch-2.6 as well. I noticed some functionalities and interfaces 
being used in your patch are not part of 2.6. And patch I'm working on now will 
remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a 
intermediate fix in branch-2.
2) I think CSQueueUtils.getAbsoluteMaxAvailCapacity doesn't hold child/parent's 
lock together, maybe we don't need to change that, could you confirm?
3) Maybe we don't need getter/setter of absoluteMaxAvailCapacity in queue, a 
volatile float is enough?

Thanks,

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3247) TestQueueMappings failure for FairScheduler

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-3247:
-
Hadoop Flags: Reviewed

 TestQueueMappings failure for FairScheduler
 ---

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335889#comment-14335889
 ] 

Hudson commented on YARN-3247:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7196 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7196/])
YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. 
Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java
* hadoop-yarn-project/CHANGES.txt


 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335892#comment-14335892
 ] 

Junping Du commented on YARN-3240:
--

+1. Patch looks good.

 [Data Mode] Implement client API to put generic entities
 

 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335895#comment-14335895
 ] 

Tsuyoshi OZAWA commented on YARN-3249:
--

Submitting a patch. Let me review.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.patch, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335900#comment-14335900
 ] 

Tsuyoshi OZAWA commented on YARN-3249:
--

[~ryu_kobayashi] thank you for contribution. Unfortunatelly, your changes 
conflict with YARN-3230. Could you rebase it? Personally, +1 for the change 
itself.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.patch, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335566#comment-14335566
 ] 

Hadoop QA commented on YARN-3131:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700584/yarn_3131_v6.patch
  against trunk revision 9a37247.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6713//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6713//console

This message is automatically generated.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-24 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Attachment: (was: YARN-3231.v2.patch)

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-24 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: YARN-3031.02.patch


Attaching a revised writer interface. 

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2467) Add SpanReceiverHost to YARN daemons

2015-02-24 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2467:
-
Assignee: Masatake Iwasaki  (was: Yi Liu)

 Add SpanReceiverHost to YARN daemons 
 -

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335785#comment-14335785
 ] 

Vinod Kumar Vavilapalli commented on YARN-3125:
---

Quick comment: I think we should try not to disturb the old code much. Let's 
just add two separate independent code blocks without removing the old style 
event-push.

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Junping Du
 Attachments: YARN-3125.patch


 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335799#comment-14335799
 ] 

Vinod Kumar Vavilapalli commented on YARN-3131:
---

Nits:
{code}
+throw new YarnException(Failed to submit  + applicationId +
+to YARN :  + appReport.getDiagnostics());
{code}
You will see the output to be something like application_123456_0001to YARN - 
a missing space

We can just simply check for failToSubmitStates? Why do we also need to check 
for waitingStates?

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >