[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running

2015-04-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496103#comment-14496103
 ] 

Rohith commented on YARN-2268:
--

I propose the following way to handle disallow state store when RM is running.
For both HA(Active and Standby) and Non-HA, it is possible to get RM state 
using REST API getClusterInfo('ws/v1/cluster/info'). This can be make use for 
identifying RM state. This is independent of any state store implementaions.
In HA, ACTIVE state is checked with all the the RM-Id's in a sequential manner. 
If no ACTIVE state RM is found then format the store otherwise throw an 
exception *ActiveResourceManagerRunningException*.

Cons : Formatting state store when HA is enabled is *Best Effort* basis, there 
would be scenario where RM state can be chagned after one of the RM is checked.

Kindly share your thoughts on this approach..

 Disallow formatting the RMStateStore when there is an RM running
 

 Key: YARN-2268
 URL: https://issues.apache.org/jira/browse/YARN-2268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Rohith

 YARN-2131 adds a way to format the RMStateStore. However, it can be a problem 
 if we format the store while an RM is actively using it. It would be nice to 
 fail the format if there is an RM running and using this store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-04-15 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3489:


 Summary: RMServerUtils.validateResourceRequests should only obtain 
queue info once
 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe


Since the label support was added we now get the queue info for each request 
being validated in SchedulerUtils.validateResourceRequest.  If 
validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
large cluster with lots of varied locality in the requests) then it will get 
the queue info for each request.  Since we build the queue info this generates 
a lot of unnecessary garbage, as the queue isn't changing between requests.  We 
should grab the queue info once and pass it down rather than building it again 
for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496099#comment-14496099
 ] 

Hadoop QA commented on YARN-3476:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724974/0001-YARN-3476.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7346//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7346//console

This message is automatically generated.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3477) TimelineClientImpl swallows root cause of retry failures

2015-04-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3477:
-
 Target Version/s: 2.7.1
Affects Version/s: (was: 3.0.0)
   2.7.0

 TimelineClientImpl swallows root cause of retry failures
 

 Key: YARN-3477
 URL: https://issues.apache.org/jira/browse/YARN-3477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran

 If timeline client fails more than the retry count, the original exception is 
 not thrown. Instead some runtime exception is raised saying retries run out
 # the failing exception should be rethrown, ideally via 
 NetUtils.wrapException to include URL of the failing endpoing
 # Otherwise, the raised RTE should (a) state that URL and (b) set the 
 original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496385#comment-14496385
 ] 

Hudson commented on YARN-3266:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.8.0

 Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
 YARN-3266.03.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496383#comment-14496383
 ] 

Hudson commented on YARN-3436:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


 Fix URIs in documention of YARN web service REST APIs
 -

 Key: YARN-3436
 URL: https://issues.apache.org/jira/browse/YARN-3436
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3436.001.patch


 /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
 {quote}
 Response Examples
 JSON response with single resource
 HTTP Request: GET 
 http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
 Response Status Line: HTTP/1.1 200 OK
 {quote}
 Url should be ws/v1/cluster/{color:red}apps{color} .
 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496384#comment-14496384
 ] 

Hudson commented on YARN-3361:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 CapacityScheduler side changes 

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496239#comment-14496239
 ] 

Thomas Graves commented on YARN-3434:
-

So I had considered putting it in the ResourceLimits but ResourceLimits seems 
to be more of a queue level thing to me (not a user level). For instance 
parentQueue passes this into leafQueue. ParentQueue cares nothing about user 
limits.  If you stored it there you would either need to track the user it was 
for or track for all users. ResourceLimits get updated when nodes are added and 
removed.  We don't need to compute a particular user limit when that happens.   
 So it would then be out of date or we change to update it when that happens, 
but that to me is fairly large change and not really needed.

The user limit calculation are lower down and recomputed per user, per 
application, per current request regularly and putting this into the global 
based on how being calculated and used didn't make sense to me. All you would 
be using it for is passing it down to assignContainer and then it would be out 
of date.  If someone else started looking at that value assuming it was up to 
date then it would be wrong (unless of course we started updating it as stated 
above).  But it would only be for a single user, not all users unless again we 
changed to calculate for every user whenever something changed. That seems a 
bit excessive.

You are correct that needToUnreserve could go away.  I started out on 2.6 which 
didn't have our changes and I could have removed it when I added in 
amountNeededUnreserve.  If we were to store it in the global ResourceLimit then 
yes the entire LimitsInfo can go away including shouldContinue as you would 
fall back to use the boolean return from each function.   But again based on my 
above comments I'm not sure ResourceLimit is the correct place to put this.

I just noticed that we are already keeping the userLimit in the User class, 
that would be another option.  But again I think we need to make it clear about 
what it is. This particular check is done per application, per user based on 
the current requested Resource.  The value stored that wouldn't necessarily 
apply to all the users applications since the resource request size could be 
different.  

thoughts or is there something I'm missing about ResourceLimits?

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-04-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3489:
--

Assignee: Varun Saxena

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena

 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3471) Fix timeline client retry

2015-04-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3471:
-
Affects Version/s: 2.8.0

 Fix timeline client retry
 -

 Key: YARN-3471
 URL: https://issues.apache.org/jira/browse/YARN-3471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.8.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3471.1.patch, YARN-3471.2.patch


 I found that the client retry has some problems:
 1. The new put methods will retry on all exception, but they should only do 
 it upon ConnectException.
 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496304#comment-14496304
 ] 

Hudson commented on YARN-3436:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


 Fix URIs in documention of YARN web service REST APIs
 -

 Key: YARN-3436
 URL: https://issues.apache.org/jira/browse/YARN-3436
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3436.001.patch


 /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
 {quote}
 Response Examples
 JSON response with single resource
 HTTP Request: GET 
 http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
 Response Status Line: HTTP/1.1 200 OK
 {quote}
 Url should be ws/v1/cluster/{color:red}apps{color} .
 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496306#comment-14496306
 ] 

Hudson commented on YARN-3266:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.8.0

 Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
 YARN-3266.03.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496305#comment-14496305
 ] 

Hudson commented on YARN-3361:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java


 CapacityScheduler 

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-15 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.8.patch

 Add Rolling Time To Lives Level DB Plugin Capabilities
 --

 Key: YARN-3448
 URL: https://issues.apache.org/jira/browse/YARN-3448
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, 
 YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch


 For large applications, the majority of the time in LeveldbTimelineStore is 
 spent deleting old entities record at a time. An exclusive write lock is held 
 during the entire deletion phase which in practice can be hours. If we are to 
 relax some of the consistency constraints, other performance enhancing 
 techniques can be employed to maximize the throughput and minimize locking 
 time.
 Split the 5 sections of the leveldb database (domain, owner, start time, 
 entity, index) into 5 separate databases. This allows each database to 
 maximize the read cache effectiveness based on the unique usage patterns of 
 each database. With 5 separate databases each lookup is much faster. This can 
 also help with I/O to have the entity and index databases on separate disks.
 Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
 sections 4:1 ration (index to entity) at least for tez. We replace DB record 
 removal with file system removal if we create a rolling set of databases that 
 age out and can be efficiently removed. To do this we must place a constraint 
 to always place an entity's events into it's correct rolling db instance 
 based on start time. This allows us to stitching the data back together while 
 reading and artificial paging.
 Relax the synchronous writes constraints. If we are willing to accept losing 
 some records that we not flushed in the operating system during a crash, we 
 can use async writes that can be much faster.
 Prefer Sequential writes. sequential writes can be several times faster than 
 random writes. Spend some small effort arranging the writes in such a way 
 that will trend towards sequential write performance over random write 
 performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496519#comment-14496519
 ] 

Hudson commented on YARN-3318:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7588 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7588/])
YARN-3318. Create Initial OrderingPolicy Framework and FifoOrderingPolicy. 
(Craig Welch via wangda) (wangda: rev 5004e753322084e42dfda4be1d2db66677f86a1e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/MockSchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/SchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoComparator.java


 Create Initial OrderingPolicy Framework and FifoOrderingPolicy
 --

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, 
 YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, 
 YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, 
 YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch, YARN-3318.61.patch


 Create the initial framework required for using OrderingPolicies and an 
 initial FifoOrderingPolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496490#comment-14496490
 ] 

Zhijie Shen commented on YARN-3051:
---

Hence, regardless the implementation detail, we logically use:

1. entity type, entity id to identify entities that are generated on the same 
cluster.
2. cluster id, entity type, entity id to identify entities globally across 
clusters.

In terms of compatibility, {{getTimelineEntity(entity type, entity id)}} can 
assume the cluster ID is either the default one or configured in yarn-site.xml.

Does it sound good?

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496534#comment-14496534
 ] 

Varun Saxena commented on YARN-3051:


Updated a WIP patch. Will update javadoc after everyone is on same page on the 
approach and API. 
Working on unit tests.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496509#comment-14496509
 ] 

Varun Saxena commented on YARN-3051:


As per the patch I am currently working on, if clusterid does not come in the 
query, it is taken from config. So thats consistent. Although I was assuming 
appid will be part of PK.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496538#comment-14496538
 ] 

Junping Du commented on YARN-3411:
--

Thanks [~vrushalic] for delivering the proposal and poc patch which is an 
excellent job!
Some quick comments from walk through proposal:
bq. Entity Table - primary key components-putting the UserID first helps to 
distribute writes across the regions in the hbase cluster.  Pros:​ avoids 
single region hotspotting. Cons:​ connections would be open to several region 
servers during writes from per node ATS.
Looks like we are try to get rid of region server hotspotting issues. I agree 
that this design could helps. However, this is still possible that specific 
user could submit much more applications than anyone else. In that case, the 
region hotspot issue will still appear. Isn't it? I think the more general way 
to solve this problem is making keys get salted with a prefix. Thoughts?

bq. Entity Table - column families​-config needs to be stored as key value, not 
as a blob to enable efficient key based querying based on config param name. 
storing it in a separate column family helps to avoid scanning over config  
while reading metrics and vice versa
+1. This leverage strength of columnar database. We should get rid of storing 
any default value for key. However, this sounds challengable if TimelineClient 
only has a configuration object.

bq. Entity Table - metrics are written to with an hbase cell timestamp set to 
top of the minute or top of the 5 minute interval or whatever is decided. This 
helps in timeseries storage and retrieval in case of querying at the entity 
level.
Can we also let TimelineCollector do some aggregation of metrics in a similar 
time interval rather than sending to HBase/Pheonix for every metrics when it 
received? This may help to lease some pressure to backend.

bq. Flow by application id table
I am still think we should figure out some way to store application attempts 
info. The typical usecase here is: for some reason (like: bug or hardware 
capability reason), some flow/application's AM could always get failed more 
times than other flows/applications. Keeping this info can help us to track 
these issues. Isn't it?

bq. flow summary daily table (aggregation table managed by Phoenix) - could be 
triggered via co­processor with each put in flow table or a cron run once per 
day to aggregate for yesterday (with catchup functionality in case of backlog 
etc)
Do each put in flow table sounds a little expensive especially when putting 
activity is very frequently. May be we should do some batch mode here? In 
addition, I think we can leverage per node TimelineCollector to do some first 
level aggregation which can help to relieve workload in backend.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3051:
---
Attachment: YARN-3051.wip.patch

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496503#comment-14496503
 ] 

Sangjin Lee commented on YARN-3051:
---

Yep. That's perfect.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496511#comment-14496511
 ] 

Sangjin Lee commented on YARN-3390:
---

{quote}
For putIfAbsent and remove, I don't use template method pattern, but let the 
subclass override the super class method and invoke it inside the override 
implementation, because I'm not sure if we will need pre process or post 
process, and if we only invoke the process when adding a new collector. If 
we're sure about template, I'm okay with the template pattern too.
{quote}
I'm fine with either approach. The main reason I thought of that is I wanted to 
be clear that the base implementation of putIfAbsent() and remove() is 
mandatory (i.e. not optional). Since we control all of it (base and 
subclasses), it might not be such a big deal either way.

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3390.1.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496528#comment-14496528
 ] 

Hadoop QA commented on YARN-3448:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725620/YARN-3448.8.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7347//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7347//console

This message is automatically generated.

 Add Rolling Time To Lives Level DB Plugin Capabilities
 --

 Key: YARN-3448
 URL: https://issues.apache.org/jira/browse/YARN-3448
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, 
 YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch


 For large applications, the majority of the time in LeveldbTimelineStore is 
 spent deleting old entities record at a time. An exclusive write lock is held 
 during the entire deletion phase which in practice can be hours. If we are to 
 relax some of the consistency constraints, other performance enhancing 
 techniques can be employed to maximize the throughput and minimize locking 
 time.
 Split the 5 sections of the leveldb database (domain, owner, start time, 
 entity, index) into 5 separate databases. This allows each database to 
 maximize the read cache effectiveness based on the unique usage patterns of 
 each database. With 5 separate databases each lookup is much faster. This can 
 also help with I/O to have the entity and index databases on separate disks.
 Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
 sections 4:1 ration (index to entity) at least for tez. We replace DB record 
 removal with file system removal if we create a rolling set of databases that 
 age out and can be efficiently removed. To do this we must place a constraint 
 to always place an entity's events into it's correct rolling db instance 
 based on start time. This allows us to stitching the data back together while 
 reading and artificial paging.
 Relax the synchronous writes constraints. If we are willing to accept losing 
 some records that we not flushed in the operating system during a crash, we 
 can use async writes that can be much faster.
 Prefer Sequential writes. sequential writes can be several times faster than 
 random writes. Spend some small effort arranging the writes in such a way 
 that will trend towards sequential write performance over random write 
 performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3490) Add an application decorator to ClientRMService

2015-04-15 Thread Jian Fang (JIRA)
Jian Fang created YARN-3490:
---

 Summary: Add an application decorator to ClientRMService
 Key: YARN-3490
 URL: https://issues.apache.org/jira/browse/YARN-3490
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Jian Fang


Per the discussion on MAPREDUCE-6304, hadoop cloud service provider wants to 
hook in some logic to control the allocation of an application on the resource 
manager side because it is sometimes impractical to control the client side of 
a hadoop cluster in cloud. Hadoop service provider and hadoop users usually 
have different privileges, control, and access on a hadoop cluster in cloud. 

One good example is that application masters should not be allocated to spot 
instances on Amazon EC2. To achieve that, an application decorator could be 
provided to orchestrate the ApplicationSubmissionContext by specifying the AM 
label expression, for example. 

Hadoop could provide a dummy decorator that does nothing by default, but it 
should allow users to replace this decorator with their own decorators to meet 
their specific needs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2605:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-149

 [RM HA] Rest api endpoints doing redirect incorrectly
 -

 Key: YARN-2605
 URL: https://issues.apache.org/jira/browse/YARN-2605
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: bc Wong
Assignee: Anubhav Dhoot
  Labels: newbie

 The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
 for pages designed to be viewed by web browsers. But the API endpoints 
 shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
 suggest HTTP 303, or return a well-defined error message (json or xml) 
 stating that the standby status and a link to the active RM.
 The standby RM is returning this today:
 {noformat}
 $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
 HTTP/1.1 200 OK
 Cache-Control: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Content-Type: text/plain; charset=UTF-8
 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 Content-Length: 117
 Server: Jetty(6.1.26)
 This is standby RM. Redirecting to the current active RM: 
 http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-04-15 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496648#comment-14496648
 ] 

Jian Fang commented on YARN-2306:
-

Could someone please tell me which JIRA has fixed this bug in trunk? I am 
working on hadoop 2.6.0 branch and need to see if I need to fix this issue or 
not. Thanks in advance.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)
zhihai xu created YARN-3491:
---

 Summary: Improve the public resource localization to do both 
FSDownload submission to the thread pool and completed localization handling in 
one thread (PublicLocalizer).
 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical


Improve the public resource localization to do both FSDownload submission to 
the thread pool and completed localization handling in one thread 
(PublicLocalizer).
Currently FSDownload submission to the thread pool is done in 
PublicLocalizer#addResource which is running in Dispatcher thread and completed 
localization handling is done in PublicLocalizer#run which is running in 
PublicLocalizer thread.
Because FSDownload submission to the thread pool at the following code is time 
consuming, the thread pool can't be fully utilized. Instead of doing public 
resource localization in parallel(multithreading), public resource localization 
is serialized most of the time.
{code}
synchronized (pending) {
  pending.put(queue.submit(new FSDownload(lfs, null, conf,
  publicDirDestPath, resource, 
request.getContext().getStatCache())),
  request);
}
{code}

Also there are two more benefits with this change:
1. The Dispatcher thread won't be blocked by above FSDownload submission. 
Dispatcher thread handles most of time critical events at Node manager.
2. don't need synchronization on HashMap (pending).
Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2696:
-
Attachment: YARN-2696.2.patch

Attached ver.2 patch fixed findbugs warning and test failures 
(TestRMDelegationTokens is not related).

I've thought about Jian's comment:
bq. We can merge PartitionedQueueComparator and nonPartitionedQueueComparator 
into a single QueueComparator.
After think about this, I think we cannot, NonPartitionedQueueComparator is 
stateless, and PartitionedQueueComparator is stateful, someone can modify 
partitionToLookAt for Partitioned.., but we should keep 
NonPartitionedQueueComparator only and always sort by default partition.


 Queue sorting in CapacityScheduler should consider node label
 -

 Key: YARN-2696
 URL: https://issues.apache.org/jira/browse/YARN-2696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2696.1.patch, YARN-2696.2.patch


 In the past, when trying to allocate containers under a parent queue in 
 CapacityScheduler. The parent queue will choose child queues by the used 
 resource from smallest to largest. 
 Now we support node label in CapacityScheduler, we should also consider used 
 resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496617#comment-14496617
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves],
I think your concerns may not be a problem, ResourceLimits will be replaced 
(instead of updated) when node heartbeat. And ResourceLimits object itself is 
to decouple Parent and Child (e.g. ParentQueue to Children, LeafQueue to apps), 
Child doesn't need to understand how Parent compute limits, it only need to 
respect limits. For example, app doesn't need to understand how queue computing 
queue capacity/user-limit/continous-reservation-looking, it only need to know 
what's the limit considering all factors, so it can make decision to 
allocate/release-before-allocate/cannot-continue.

The usage of ResourceLimits in my mind for user-limit case is:
- ParentQueue compute/set limits
- LeafQueue store limits (why store see 1.)
- LeafQueue recompute/set user-limit when trying to do allocate for each 
app/priority
- LeafQueue check user-limit as well as limits when trying to allocate/reserve 
container
- The user-limit saved in ResourceLimits is only used in normal 
allocation/reservation path, if it's a reserved allocation, we will reset 
user-limit to un-limited.

1. Why store limits in LeafQueue instead of passing down?
This is required by headroom computing, app's headroom is affected by queue's 
parent as well as sibling changes, we cannot update all app's headroom when 
that changes, but we need recompute headroom when app do heartbeat, so we have 
to store latest ResourceLimits in LeafQueue. See YARN-2008 for more information.

I'm not sure if above can make you understand better about my suggestion. 
Please let me know your thoughts.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496693#comment-14496693
 ] 

Tsuyoshi Ozawa commented on YARN-3326:
--

+1, committing this shortly. Hey [~Naganarasimha], could you open new JIRA to 
update documentation for this feature?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
 YARN-3326.20150408-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496644#comment-14496644
 ] 

Wangda Tan commented on YARN-3434:
--

bq. All you would be using it for is passing it down to assignContainer and 
then it would be out of date. If someone else started looking at that value 
assuming it was up to date then it would be wrong (unless of course we started 
updating it as stated above). But it would only be for a single user, not all 
users unless again we changed to calculate for every user whenever something 
changed. That seems a bit excessive.
To clarify, ResourceLimits is the bridge between parent and child, parent will 
tell child hey, this is the limit you can use, LeafQueue will do the same 
thing to app, ParentQueue doesn't compute/pass-down user-limit to LeafQueue at 
all, LeafQueue will do that and make sure it get updated for every allocation.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496663#comment-14496663
 ] 

Jason Lowe commented on YARN-3491:
--

Could you elaborate a bit on why the submit is time consuming?  Unless I'm 
mistaken, the FSDownload constructor is very cheap and queueing should be 
simply tacking an entry on a queue.

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 Currently FSDownload submission to the thread pool is done in 
 PublicLocalizer#addResource which is running in Dispatcher thread and 
 completed localization handling is done in PublicLocalizer#run which is 
 running in PublicLocalizer thread.
 Because FSDownload submission to the thread pool at the following code is 
 time consuming, the thread pool can't be fully utilized. Instead of doing 
 public resource localization in parallel(multithreading), public resource 
 localization is serialized most of the time.
 {code}
 synchronized (pending) {
   pending.put(queue.submit(new FSDownload(lfs, null, conf,
   publicDirDestPath, resource, 
 request.getContext().getStatCache())),
   request);
 }
 {code}
 Also there are two more benefits with this change:
 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
 Dispatcher thread handles most of time critical events at Node manager.
 2. don't need synchronization on HashMap (pending).
 Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-3492:
--

 Summary: AM fails to come up because RM and NM can't connect to 
each other
 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker


Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
container gets allocated, but doesn't get launched. The NM can't talk to the 
RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3492:
---
Attachment: yarn-kasha-resourcemanager-kasha-mbp.local.log
yarn-kasha-nodemanager-kasha-mbp.local.log

 AM fails to come up because RM and NM can't connect to each other
 -

 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-kasha-nodemanager-kasha-mbp.local.log, 
 yarn-kasha-resourcemanager-kasha-mbp.local.log


 Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
 container gets allocated, but doesn't get launched. The NM can't talk to the 
 RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496881#comment-14496881
 ] 

Jian He commented on YARN-2696:
---

few minor comments 
- add a comment why no_label max resource is treated separately. 
{code}
if (nodePartition == null
|| nodePartition.equals(RMNodeLabelsManager.NO_LABEL))
{code}
- getChildrenAllocationIterator - sortAndGetChildrenAllocationIterator

 Queue sorting in CapacityScheduler should consider node label
 -

 Key: YARN-2696
 URL: https://issues.apache.org/jira/browse/YARN-2696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2696.1.patch, YARN-2696.2.patch


 In the past, when trying to allocate containers under a parent queue in 
 CapacityScheduler. The parent queue will choose child queues by the used 
 resource from smallest to largest. 
 Now we support node label in CapacityScheduler, we should also consider used 
 resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496888#comment-14496888
 ] 

Jian He commented on YARN-3354:
---

+1 

 Container should contains node-labels asked by original ResourceRequests
 

 Key: YARN-3354
 URL: https://issues.apache.org/jira/browse/YARN-3354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, capacityscheduler, nodemanager, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3354.1.patch, YARN-3354.2.patch


 We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
 resource requests can be allocated on labeled nodes which has idle resources.
 To make preemption work, we need know an allocated container's original node 
 label: when labeled resource requests comes back, we need kill non-labeled 
 containers running on labeled nodes.
 This requires add node-labels in Container, and also, NM need store this 
 information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496892#comment-14496892
 ] 

Jian He commented on YARN-2696:
---

- Does this overlap with below {{Resources.equals(queueGuranteedResource, 
Resources.none()) ? 0}}  check ?
{code}
  // make queueGuranteed = minimum_allocation to avoid divided by 0.
  queueGuranteedResource =
  Resources.max(rc, totalPartitionResource, queueGuranteedResource,
  minimumAllocation);
{code}

 Queue sorting in CapacityScheduler should consider node label
 -

 Key: YARN-2696
 URL: https://issues.apache.org/jira/browse/YARN-2696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2696.1.patch, YARN-2696.2.patch


 In the past, when trying to allocate containers under a parent queue in 
 CapacityScheduler. The parent queue will choose child queues by the used 
 resource from smallest to largest. 
 Now we support node label in CapacityScheduler, we should also consider used 
 resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496910#comment-14496910
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves],
Make sense to me, especially for the {{local transient variable rather then a 
globally stored one}}. So I think after the change, flows to use/update 
ResourceLimit will be:
{code}
In LeafQueue:

Both:
  updateClusterResource |
|-- resource-limit 
  assignContainers  | updatestore   (only for compute headroom)

Only:
  assignContainers
|
V
 check queue limit
|
V
 check user limit
|
V
 set how-much-should-unreserve to ResourceLimits and pass down
 {code}

 Is that what you also think about?

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496918#comment-14496918
 ] 

Tsuyoshi Ozawa commented on YARN-3492:
--

[~kasha], could you attach yarn-site.xml and mapred-site.xml for investigation?

 AM fails to come up because RM and NM can't connect to each other
 -

 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-kasha-nodemanager-kasha-mbp.local.log, 
 yarn-kasha-resourcemanager-kasha-mbp.local.log


 Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
 container gets allocated, but doesn't get launched. The NM can't talk to the 
 RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496920#comment-14496920
 ] 

Jian He commented on YARN-3404:
---

+1

 View the queue name to YARN Application page
 

 Key: YARN-3404
 URL: https://issues.apache.org/jira/browse/YARN-3404
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
 YARN-3404.4.patch, screenshot.png


 It want to display the name of the queue that is used to YARN Application 
 page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3492:
---
Attachment: yarn-site.xml
mapred-site.xml

 AM fails to come up because RM and NM can't connect to each other
 -

 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker
 Attachments: mapred-site.xml, 
 yarn-kasha-nodemanager-kasha-mbp.local.log, 
 yarn-kasha-resourcemanager-kasha-mbp.local.log, yarn-site.xml


 Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
 container gets allocated, but doesn't get launched. The NM can't talk to the 
 RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-04-15 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3005:

Assignee: Kengo Seki

 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Kengo Seki
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3326:
-
Summary: Support RESTful API for getLabelsToNodes   (was: ReST support for 
getLabelsToNodes )

 Support RESTful API for getLabelsToNodes 
 -

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
 YARN-3326.20150408-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496735#comment-14496735
 ] 

Thomas Graves commented on YARN-3434:
-

I am not saying child needs to know how parent calculate resource limit.  I am 
saying user limit and whether it needs to unreserve to make another reservation 
has nothing to do with the parent queue (ie it doesn't apply to parent queue).  
Remember I'm not needing to store user limit, I'm needing to store the fact of 
whether it needs to unreserve and if it does how much does it need to unreserve.

When a node heartbeats it goes through the regular assignments and updates the 
leafQueue clusterResources based on what the parent passes in. When a node is 
removed or added then it updates the resource limits (none of these apply to 
calculation of whether it needs to unreserve or not). 

Basically it comes down to is this information useful outside of the small 
window between when it calculates it and when its needed in assignContainer() 
and my thought is no.  And you said it yourself in last bullet above.  Although 
we have been referring to the userLImit and perhaps that is the problem.  I 
don't need to store the userLimit, I need to store whether it needs to 
unreserve and if so how much.  Therefore it fits better as a local transient 
variable rather then a globally stored one.  If you store just the userLImit 
then you need to recalculate stuff which I'm trying to avoid.

I understand why we are storing the current information in ResourceLimits 
because it has to do with headroom and parent limits and is recalculated at 
various points, but the current implementation in canAssignToUser doesn't use 
headroom at all and whether we need to unreserve or not on the last call to 
assignContainers doesn't affect the headroom calculation.

Again basically all we would be doing is placing an extra global variable(s) in 
the ResourceLimits class just to pass it on down a couple of functions. That to 
me is a parameter.   Now if we had multiple things needing this or updating it 
then to me fits better in the ResourceLimits.  



 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2

2015-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496757#comment-14496757
 ] 

Naganarasimha G R commented on YARN-3462:
-

Thanks for reviewing and Commiting , [~qwertymaniac]  [~sidharta-s]

 Patches applied for YARN-2424 are inconsistent between trunk and branch-2
 -

 Key: YARN-3462
 URL: https://issues.apache.org/jira/browse/YARN-3462
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Sidharta Seethana
Assignee: Naganarasimha G R
 Fix For: 2.7.1

 Attachments: YARN-3462.20150508-1.patch


 It looks like the changes for YARN-2424 are not the same for trunk (commit 
 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
 and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496712#comment-14496712
 ] 

Tsuyoshi Ozawa commented on YARN-3326:
--

Committed this to trunk and branch-2. Thanks [~Naganarasimha] for your 
contribution and thanks [~vvasudev] for your review!

 Support RESTful API for getLabelsToNodes 
 -

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
 YARN-3326.20150408-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496733#comment-14496733
 ] 

Naganarasimha G R commented on YARN-3326:
-

Thanks [~ozawa], thanks for the review, Will check the scope of yarn-2801 and 
if it doesnt cover this feature then will raise a new jira. 

 Support RESTful API for getLabelsToNodes 
 -

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
 YARN-3326.20150408-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496702#comment-14496702
 ] 

zhihai xu commented on YARN-3491:
-

I saw the serialization for public resource localization in the following logs:
The following log shows two private localization requests and many public 
localization requests from container_e30_1426628374875_110892_01_000475
{code}
2015-04-07 22:49:56,750 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_e30_1426628374875_110892_01_000475 transitioned from NEW to 
LOCALIZING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.xml 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar
 transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource hdfs://nameservice1/tmp/temp182237/tmp-1521315530/ace-geo.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp1347512155/cortex-server.jar 
transitioned from INIT to DOWNLOADING
{code}

The following log shows how the public resource localizations are processed.
{code}
2015-04-07 22:49:56,758 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_e30_1426628374875_110892_01_000475

2015-04-07 22:49:56,758 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar, 
1428446867531, FILE, null }

2015-04-07 22:49:56,882 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar, 
1428446864128, FILE, null }

2015-04-07 22:49:56,902 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar(-/data2/yarn/nm/filecache/4877652/reflections.jar)
 transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,127 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar,
 1428446858408, FILE, null }

2015-04-07 22:49:57,145 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar(-/data11/yarn/nm/filecache/4877653/service-media-sdk.jar)
 transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,251 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp-1521315530/ace-geo.jar, 
1428446862857, FILE, null }

2015-04-07 22:49:57,270 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar(-/data1/yarn/nm/filecache/4877654/service-local-search-sdk.jar)
 transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp1347512155/cortex-server.jar, 
1428446857069, FILE, null }
{code}

Based on the log, You can see the thread pools are not fully used, only one 
thread is used. The default thread 

[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-04-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496708#comment-14496708
 ] 

Akira AJISAKA commented on YARN-3005:
-

Assigned [~sekikn]. Thanks.

 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Kengo Seki
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-3005.001.patch, YARN-3005.002.patch


 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3394) WebApplication proxy documentation is incomplete

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496725#comment-14496725
 ] 

Tsuyoshi Ozawa commented on YARN-3394:
--

Thanks Naganarasimha for your contribution and thanks Jian for your commit!

 WebApplication  proxy documentation is incomplete
 -

 Key: YARN-3394
 URL: https://issues.apache.org/jira/browse/YARN-3394
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor
 Fix For: 2.8.0

 Attachments: WebApplicationProxy.html, YARN-3394.20150324-1.patch


 Webproxy documentation is incomplete
 hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html
 1.Configuration of service start/stop as separate server
 2.Steps to start as daemon service
 3.Secure mode for Web proxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496732#comment-14496732
 ] 

Hadoop QA commented on YARN-2696:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725637/YARN-2696.2.patch
  against trunk revision 9e8309a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7348//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7348//console

This message is automatically generated.

 Queue sorting in CapacityScheduler should consider node label
 -

 Key: YARN-2696
 URL: https://issues.apache.org/jira/browse/YARN-2696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2696.1.patch, YARN-2696.2.patch


 In the past, when trying to allocate containers under a parent queue in 
 CapacityScheduler. The parent queue will choose child queues by the used 
 resource from smallest to largest. 
 Now we support node label in CapacityScheduler, we should also consider used 
 resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496731#comment-14496731
 ] 

Hudson commented on YARN-3326:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7590 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7590/])
YARN-3326. Support RESTful API for getLabelsToNodes. Contributed by 
Naganarasimha G R. (ozawa: rev e48cedc663b8a26fd62140c8e2907f9b4edd9785)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LabelsToNodesInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodeIDsInfo.java


 Support RESTful API for getLabelsToNodes 
 -

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
 YARN-3326.20150408-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497311#comment-14497311
 ] 

Hadoop QA commented on YARN-3463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725702/YARN-3463.64.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1147 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7350//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7350//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7350//console

This message is automatically generated.

 Integrate OrderingPolicy Framework with CapacityScheduler
 -

 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
 YARN-3463.64.patch, YARN-3463.65.patch


 Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497087#comment-14497087
 ] 

Wangda Tan commented on YARN-3434:
--

bq. Or were you saying create a ResourceLimit and pass it as parameter to 
canAssignToUser and canAssignToThisQueue and modify that instance. That 
instance would then be passed down though to assignContainer()?
I prefer the above one which is according to your previously comment local 
transient variable rather than a globally stored one. Is this also what you 
preferred?

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.64.patch

rebased to current trunk

 Integrate OrderingPolicy Framework with CapacityScheduler
 -

 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
 YARN-3463.64.patch


 Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497261#comment-14497261
 ] 

Wangda Tan commented on YARN-2498:
--

Discussed with [~mayank_bansal], taking over and working on this, will post 
patch/implementation-notes soon.

 Respect labels in preemption policy of capacity scheduler
 -

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
 yarn-2498-implementation-notes.pdf


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when there're some resource available in the cluster, we shouldn't 
 assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
 access such labels
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Sumana Sathish (JIRA)
Sumana Sathish created YARN-3493:


 Summary: RM fails to come up with error Failed to load/recover 
state when  mem settings are changed
 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Priority: Critical
 Fix For: 2.7.0


RM fails to come up for the following case:
1. Change yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background 
and wait for the job to reach running state
3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
before the above job completes
4. Restart RM
5. RM fails to come up with the below error
{code:title= RM error for Mem settings changed}
 - RM app submission failed in validating AM resource request for application 
application_1429094976272_0008
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested memory  0, or requested memory  max configured, 
requestedMemory=3072, maxMemory=2048
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(579)) - Failed to load/recover state
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested memory  0, or requested memory  max configured, 
requestedMemory=3072, maxMemory=2048
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 

[jira] [Assigned] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2498:


Assignee: Wangda Tan  (was: Mayank Bansal)

 Respect labels in preemption policy of capacity scheduler
 -

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
 yarn-2498-implementation-notes.pdf


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when there're some resource available in the cluster, we shouldn't 
 assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
 access such labels
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497065#comment-14497065
 ] 

Wangda Tan commented on YARN-3434:
--

bq. Are you suggesting we change the patch to modify ResourceLimits and pass 
down rather then using the LimitsInfo class? 
Yes, that's my suggested.

bq. at least not without adding the shouldContinue flag to it
Kind of, what I'm thinking is we can add amountNeededUnreserve to 
ResourceLimits. canAssignToThisQueue/User will return boolean means 
shouldContinue, and set amountNeededUnreserve (instead of limit, we don't 
need to change limit). That very similar to your original logic and we don't 
need the extra LimitsInfo. After we get the updated the ResourceLimit and pass 
down, problem should be resolved.

Did I miss anything?

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497085#comment-14497085
 ] 

zhihai xu commented on YARN-3491:
-

Hi [~jlowe], thanks for the comment. Queueing is faster, but It take longer 
time to add FSDownload to the worker thread.
If all threads in the thread pool are used, it will be very fast to add an 
entry to the queue LinkedBlockingQueue#offer.
Based on the following code in ThreadPoolExecutor#execute, corePoolSize is 
thread pool size which is 4 in this case.
workQueue.offer(command) is fast but addWorker is slow. It only queues the task 
when all threads in the thread pool are running.
{code}
   public void execute(Runnable command) {
if (command == null)
throw new NullPointerException();
/*
 * Proceed in 3 steps:
 *
 * 1. If fewer than corePoolSize threads are running, try to
 * start a new thread with the given command as its first
 * task.  The call to addWorker atomically checks runState and
 * workerCount, and so prevents false alarms that would add
 * threads when it shouldn't, by returning false.
 *
 * 2. If a task can be successfully queued, then we still need
 * to double-check whether we should have added a thread
 * (because existing ones died since last checking) or that
 * the pool shut down since entry into this method. So we
 * recheck state and if necessary roll back the enqueuing if
 * stopped, or start a new thread if there are none.
 *
 * 3. If we cannot queue task, then we try to add a new
 * thread.  If it fails, we know we are shut down or saturated
 * and so reject the task.
 */
int c = ctl.get();
if (workerCountOf(c)  corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
if (isRunning(c)  workQueue.offer(command)) {
int recheck = ctl.get();
if (! isRunning(recheck)  remove(command))
reject(command);
else if (workerCountOf(recheck) == 0)
addWorker(null, false);
}
else if (!addWorker(command, false))
reject(command);
}
{code}

The issue is:
If the time to run one FSDownload(resource localization) is close to the time 
to run the submit(add FSDownload to the worker thread).
The oscillation will happen and there will be only one worker thread running. 
Then Dispatcher thread will be blocked for longer time.
The above logs can prove this situation. LocalizerRunner#addResource used by 
private localizer takes less than one millisecond to process one 
REQUEST_RESOURCE_LOCALIZATION event but PublicLocalizer#addResource used by 
public localizer takes 124 millisecond to process one 
REQUEST_RESOURCE_LOCALIZATION  event.


 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 Currently FSDownload submission to the thread pool is done in 
 PublicLocalizer#addResource which is running in Dispatcher thread and 
 completed localization handling is done in PublicLocalizer#run which is 
 running in PublicLocalizer thread.
 Because FSDownload submission to the thread pool at the following code is 
 time consuming, the thread pool can't be fully utilized. Instead of doing 
 public resource localization in parallel(multithreading), public resource 
 localization is serialized most of the time.
 {code}
 synchronized (pending) {
   pending.put(queue.submit(new FSDownload(lfs, null, conf,
   publicDirDestPath, resource, 
 request.getContext().getStatCache())),
   request);
 }
 {code}
 Also there are two more benefits with this change:
 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
 Dispatcher thread handles most of time critical events at Node manager.
 2. don't need synchronization on HashMap (pending).
 Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Sumana Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumana Sathish updated YARN-3493:
-
Attachment: yarn-yarn-resourcemanager.log.zip

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Priority: Critical
 Fix For: 2.7.0

 Attachments: yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 

[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Fix Version/s: (was: 2.7.0)

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 

[jira] [Assigned] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-3493:
-

Assignee: Jian He

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 

[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497129#comment-14497129
 ] 

Jian He commented on YARN-3493:
---

[~kasha], I think this happened on a different code path.

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497139#comment-14497139
 ] 

Hadoop QA commented on YARN-2696:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725687/YARN-2696.3.patch
  against trunk revision b2e6cf6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7349//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7349//console

This message is automatically generated.

 Queue sorting in CapacityScheduler should consider node label
 -

 Key: YARN-2696
 URL: https://issues.apache.org/jira/browse/YARN-2696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch


 In the past, when trying to allocate containers under a parent queue in 
 CapacityScheduler. The parent queue will choose child queues by the used 
 resource from smallest to largest. 
 Now we support node label in CapacityScheduler, we should also consider used 
 resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.65.patch

Suppress orderingpolicy from appearing in web service responses, is still on 
the web ui

 Integrate OrderingPolicy Framework with CapacityScheduler
 -

 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
 YARN-3463.64.patch, YARN-3463.65.patch


 Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Attachment: YARN-3493.1.patch

Upload a patch to ignore this exception on recovery

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
   

[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497341#comment-14497341
 ] 

Sangjin Lee commented on YARN-3390:
---

I took a pass at the patch, and it looks good for the most part. I would ask 
you to reconcile the TimelineCollectorManager changes with what I have over on 
YARN-3437. Again, I have a slight preference for the hook/template methods for 
the aforementioned reason, but it's not a strong preference one way or another.

However, I'm not sure why there is a change for RMContainerAllocator.java. It 
doesn't look like an intended change?

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3390.1.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497320#comment-14497320
 ] 

Sangjin Lee commented on YARN-3051:
---

We chatted offline about the issue of what context is required for the reader 
API and the uniqueness requirement. I'm not sure if there is a complete 
agreement on this yet, but at least this is a proposal from us ([~vrushalic], 
[~jrottinghuis], and me).

- for reader calls that ask for sub-application entities, the application id 
must be specified
- uniqueness is similarly defined; (entity type, entity id) uniquely identifies 
an entity within the scope of a YARN application

We feel that this is the most natural way of supporting writes/reads. One 
scenario to consider is reducing impact on current users of ATS, as v.2 would 
require app id which v.1 did not require. For that, we would need to update the 
user library to have a compatibility layer (e.g. tez, etc.).

Thoughts?

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497055#comment-14497055
 ] 

Thomas Graves commented on YARN-3434:
-

I agree with Both section.  I'm not sure I completely follow the Only section. 
Are you suggesting we change the patch to modify ResourceLimits and pass down 
rather then using the LimitsInfo class?  If so that won't work, at least not 
without adding the shouldContinue flag to it.  Unless you mean keep LimitsInfo 
class for use locally in assignContainers and then pass ResourceLimits down to 
assignContainer with the value of amountNeededUnreserve as the limit.  That 
wouldn't really change much exception the object we pass down through the 
functions. 

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497076#comment-14497076
 ] 

Thomas Graves commented on YARN-3434:
-

so you are saying add amountNeededUnreserve to ResourceLimits and then set the 
global currentResourceLimits.amountNeededUnreserve inside of canAssignToUser?  
This is what I was not in favor of above and there would be no need to pass it 
down as parameter.

Or were you saying create a ResourceLimit and pass it as parameter to 
canAssignToUser and canAssignToThisQueue and modify that instance. That 
instance would then be passed down though to assignContainer()?

I don't see how else you set the ResourceLimit.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497122#comment-14497122
 ] 

Karthik Kambatla commented on YARN-3493:


[~jianhe] - YARN-2010 should have fixed this right? 

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Priority: Critical
 Fix For: 2.7.0

 Attachments: yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497304#comment-14497304
 ] 

Sangjin Lee commented on YARN-3491:
---

I have the same question as [~jlowe]. The actual call

{code}
synchronized (pending) {
  pending.put(queue.submit(new FSDownload(lfs, null, conf,
  publicDirDestPath, resource, 
request.getContext().getStatCache())),
  request);
}
{code}
should be completely non-blocking and there is nothing that's expensive about 
it with the possible exception of the synchronization. Could you describe the 
root cause of the slowness you're seeing in some more detail?

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 Currently FSDownload submission to the thread pool is done in 
 PublicLocalizer#addResource which is running in Dispatcher thread and 
 completed localization handling is done in PublicLocalizer#run which is 
 running in PublicLocalizer thread.
 Because FSDownload submission to the thread pool at the following code is 
 time consuming, the thread pool can't be fully utilized. Instead of doing 
 public resource localization in parallel(multithreading), public resource 
 localization is serialized most of the time.
 {code}
 synchronized (pending) {
   pending.put(queue.submit(new FSDownload(lfs, null, conf,
   publicDirDestPath, resource, 
 request.getContext().getStatCache())),
   request);
 }
 {code}
 Also there are two more benefits with this change:
 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
 Dispatcher thread handles most of time critical events at Node manager.
 2. don't need synchronization on HashMap (pending).
 Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497315#comment-14497315
 ] 

Vrushali C commented on YARN-3051:
--

Hi [~varun_saxena]
As per the discussion in the call today, here is the query document about flow 
(and user and queue) based queries that I had mentioned (put up on jira 
YARN-3050) 
https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx

Also, some points that I think may be helpful:
- the reader API is not going to be limited to one or two api calls
- different queries will need different core read apis. For instance, all flow 
based queries may not need the application id or entity id info, but rather 
would need the flow id. for example, for a given user, return the flows that 
were run during this time frame. This query requires only cluster and cluster 
info, not entity nor application nor flowname is needed for the reader API to 
serve this query. This query cannot be boiled down to an entity level query.
- So the reader API should allow for entity level, application level, flow 
level, user level, queue level and cluster level queries.



 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497317#comment-14497317
 ] 

Jian He commented on YARN-3493:
---

cancel the patch, uploading a newer version.

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Assigned] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-15 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3494:


Assignee: Rohith

 Expose AM resource limit and user limit in QueueMetrics 
 

 Key: YARN-3494
 URL: https://issues.apache.org/jira/browse/YARN-3494
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith

 Now we have the AM resource limit and user limit shown on the web UI, it 
 would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497378#comment-14497378
 ] 

Rohith commented on YARN-3493:
--

+1(non-binding)

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
 yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
  

[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497411#comment-14497411
 ] 

zhihai xu commented on YARN-3491:
-

Hi [~sjlee0], that is a good point. I just think about queue.submit is the 
Bottleneck. Queue.submit is just part of the code in 
PublicLocalizer#addResource, the Bottleneck may come from 
publicRsrc.getPathForLocalization, we add a lot of stuff in 
LocalResourcesTrackerImpl#getPathForLocalization such as 
{{stateStore.startResourceLocalization(user, appId,
  ((LocalResourcePBImpl) lr).getProto(), localPath); }}

I should describe it more clearly. Based on the log, the issue is: 
PublicLocalizer#addResource is very slow, which blocks the Dispatcher thread, I 
looked at the following code at PublicLocalizer#addResource, I feel 
queue.submit may take most of CPU cycles, based on [~jlowe]'s and your comment, 
the slowness may come from other code such as publicRsrc.getPathForLocalization 
or dirsHandler.getLocalPathForWrite. But I think moving all these code in 
PublicLocalizer#addResource from Dispatcher thread to PublicLocalizer thread 
should be a good optimization. We can use a synchronizedList of 
LocalizerResourceRequestEvent to store all these events for public resource 
localization, which is similar as what LocalizerRunner does for private 
resource localization.
I will do some more profiling to see what is Bottleneck in 
PublicLocalizer#addResource,
{code}
public void addResource(LocalizerResourceRequestEvent request) {
  // TODO handle failures, cancellation, requests by other containers
  LocalizedResource rsrc = request.getResource();
  LocalResourceRequest key = rsrc.getRequest();
  LOG.info(Downloading public rsrc: + key);
  /*
   * Here multiple containers may request the same resource. So we need
   * to start downloading only when
   * 1) ResourceState == DOWNLOADING
   * 2) We are able to acquire non blocking semaphore lock.
   * If not we will skip this resource as either it is getting downloaded
   * or it FAILED / LOCALIZED.
   */

  if (rsrc.tryAcquire()) {
if (rsrc.getState() == ResourceState.DOWNLOADING) {
  LocalResource resource = request.getResource().getRequest();
  try {
Path publicRootPath =
dirsHandler.getLocalPathForWrite(. + Path.SEPARATOR
+ ContainerLocalizer.FILECACHE,
  ContainerLocalizer.getEstimatedSize(resource), true);
Path publicDirDestPath =
publicRsrc.getPathForLocalization(key, publicRootPath);
if (!publicDirDestPath.getParent().equals(publicRootPath)) {
  DiskChecker.checkDir(new 
File(publicDirDestPath.toUri().getPath()));
}

// In case this is not a newly initialized nm state, ensure
// initialized local/log dirs similar to LocalizerRunner
getInitializedLocalDirs();
getInitializedLogDirs();

// explicitly synchronize pending here to avoid future task
// completing and being dequeued before pending updated
synchronized (pending) {
  pending.put(queue.submit(new FSDownload(lfs, null, conf,
  publicDirDestPath, resource, 
request.getContext().getStatCache())),
  request);
}
  } catch (IOException e) {
rsrc.unlock();
publicRsrc.handle(new ResourceFailedLocalizationEvent(request
  .getResource().getRequest(), e.getMessage()));
LOG.error(Local path for public localization is not found. 
+  May be disks failed., e);
  } catch (IllegalArgumentException ie) {
rsrc.unlock();
publicRsrc.handle(new ResourceFailedLocalizationEvent(request
.getResource().getRequest(), ie.getMessage()));
LOG.error(Local path for public localization is not found. 
+  Incorrect path.  + request.getResource().getRequest()
.getPath(), ie);
  } catch (RejectedExecutionException re) {
rsrc.unlock();
publicRsrc.handle(new ResourceFailedLocalizationEvent(request
  .getResource().getRequest(), re.getMessage()));
LOG.error(Failed to submit rsrc  + rsrc +  for download.
+  Either queue is full or threadpool is shutdown., re);
  }
} else {
  rsrc.unlock();
}
  }
}
{code}



 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 

[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497413#comment-14497413
 ] 

Rohith commented on YARN-3493:
--

The same problem would occur enabling RM work preserving restart where Running 
AM updates its ResourceRequest on RESYNC command from RM. This causes throw 
InvalidResourceRequestException to AM which AM do not expect it.

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
 yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
  

[jira] [Updated] (YARN-3495) Confusing log generated by FairScheduler

2015-04-15 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3495:
---
Attachment: YARN-3495.patch

 Confusing log generated by FairScheduler
 

 Key: YARN-3495
 URL: https://issues.apache.org/jira/browse/YARN-3495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3495.patch


 2015-04-16 12:03:48,531 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3495) Confusing log generated by FairScheduler

2015-04-15 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created YARN-3495:
--

 Summary: Confusing log generated by FairScheduler
 Key: YARN-3495
 URL: https://issues.apache.org/jira/browse/YARN-3495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


2015-04-16 12:03:48,531 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler

2015-04-15 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497516#comment-14497516
 ] 

Brahma Reddy Battula commented on YARN-3495:


Attached the patch..Kindly Review.. YARN-3197 fixed capacity-scheduler..

 Confusing log generated by FairScheduler
 

 Key: YARN-3495
 URL: https://issues.apache.org/jira/browse/YARN-3495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3495.patch


 2015-04-16 12:03:48,531 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497565#comment-14497565
 ] 

zhihai xu commented on YARN-3491:
-

Hi [~jlowe] and [~sjlee0], I think I know what is bottleneck in  
PublicLocalizer#addResource.
I checked the old NM logs from old code in 2.3.0 release. 
PublicLocalizer#addResource took less than one millisecond in 2.3.0 release .
{code}
2014-10-21 18:11:10,956 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-602532977/asm.jar, 1413914982330, 
FILE, null }
2014-10-21 18:11:10,956 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-983952127/start.jar, 1413914978818, 
FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-700474448/jsch.jar, 1413914981670, 
FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-295789958/kfs.jar, 1413914974035, 
FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp1832142372/datasvc-search.jar, 
1413914970738, FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-1244404847/args4j.jar, 
1413914982044, FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp729860031/slf4j-log4j12.jar, 
1413914980407, FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-1748521227/jackson-mapper-asl.jar, 
1413914983142, FILE, null }
2014-10-21 18:11:10,957 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-246818030/jasper-compiler.jar, 
1413914979243, FILE, null }
2014-10-21 18:11:10,958 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp-1620691366/tmp-1703279108/spiffy.jar, 
1413914974080, FILE, null }
{code}

Then I compared the public localization code, the difference is at 
LocalResourcesTrackerImpl#getPathForLocalization:
The following code is added after 2.3.0 release:
{code}
rPath = new Path(rPath,
Long.toString(uniqueNumberGenerator.incrementAndGet()));
Path localPath = new Path(rPath, req.getPath().getName());
LocalizedResource rsrc = localrsrc.get(req);
rsrc.setLocalPath(localPath);
LocalResource lr = LocalResource.newInstance(req.getResource(),
req.getType(), req.getVisibility(), req.getSize(),
req.getTimestamp());
try {
  stateStore.startResourceLocalization(user, appId,
  ((LocalResourcePBImpl) lr).getProto(), localPath);
} catch (IOException e) {
  LOG.error(Unable to record localization start for  + rsrc, e);
}
{code}

I think most likely stateStore.startResourceLocalization is the bottleneck.
startResourceLocalization stored the state in the levelDB. the levelDB 
operation is time consuming.  It need go through the JNI interface.
{code}
  public void startResourceLocalization(String user, ApplicationId appId,
  LocalResourceProto proto, Path localPath) throws IOException {
String key = getResourceStartedKey(user, appId, localPath.toString());
try {
  db.put(bytes(key), proto.toByteArray());
} catch (DBException e) {
  throw new IOException(e);
}
  }
{code}
I think it would be better to do these levelDB operations in a separate thread 
using AsyncDispatcher in NMLeveldbStateStoreService.

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 

[jira] [Updated] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3491:

Description: 
Improve the public resource localization to do both FSDownload submission to 
the thread pool and completed localization handling in one thread 
(PublicLocalizer).
Currently FSDownload submission to the thread pool is done in 
PublicLocalizer#addResource which is running in Dispatcher thread and completed 
localization handling is done in PublicLocalizer#run which is running in 
PublicLocalizer thread.
Because PublicLocalizer#addResource is time consuming, the thread pool can't be 
fully utilized. Instead of doing public resource localization in 
parallel(multithreading), public resource localization is serialized most of 
the time.

Also there are two more benefits with this change:
1. The Dispatcher thread won't be blocked by PublicLocalizer#addResource . 
Dispatcher thread handles most of time critical events at Node manager.
2. don't need synchronization on HashMap (pending).
Because pending will be only accessed in PublicLocalizer thread.

  was:
Improve the public resource localization to do both FSDownload submission to 
the thread pool and completed localization handling in one thread 
(PublicLocalizer).
Currently FSDownload submission to the thread pool is done in 
PublicLocalizer#addResource which is running in Dispatcher thread and completed 
localization handling is done in PublicLocalizer#run which is running in 
PublicLocalizer thread.
Because PublicLocalizer#addResource is time consuming, the thread pool can't be 
fully utilized. Instead of doing public resource localization in 
parallel(multithreading), public resource localization is serialized most of 
the time.

Also there are two more benefits with this change:
1. The Dispatcher thread won't be blocked by above FSDownload submission. 
Dispatcher thread handles most of time critical events at Node manager.
2. don't need synchronization on HashMap (pending).
Because pending will be only accessed in PublicLocalizer thread.


 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 Currently FSDownload submission to the thread pool is done in 
 PublicLocalizer#addResource which is running in Dispatcher thread and 
 completed localization handling is done in PublicLocalizer#run which is 
 running in PublicLocalizer thread.
 Because PublicLocalizer#addResource is time consuming, the thread pool can't 
 be fully utilized. Instead of doing public resource localization in 
 parallel(multithreading), public resource localization is serialized most of 
 the time.
 Also there are two more benefits with this change:
 1. The Dispatcher thread won't be blocked by PublicLocalizer#addResource . 
 Dispatcher thread handles most of time critical events at Node manager.
 2. don't need synchronization on HashMap (pending).
 Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497439#comment-14497439
 ] 

Hadoop QA commented on YARN-3493:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725743/YARN-3493.2.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7352//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7352//console

This message is automatically generated.

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
 yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)

[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2

2015-04-15 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3437:
--
Attachment: YARN-3437.002.patch

Rebased the patch with the latest from the YARN-2928 branch.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497451#comment-14497451
 ] 

Hadoop QA commented on YARN-3463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725744/YARN-3463.66.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1148 javac 
compiler warnings (more than the trunk's current 1147 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7353//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7353//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7353//console

This message is automatically generated.

 Integrate OrderingPolicy Framework with CapacityScheduler
 -

 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
 YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch


 Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-04-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3491:

Summary: PublicLocalizer#addResource is too slow.  (was: Improve the public 
resource localization to do both FSDownload submission to the thread pool and 
completed localization handling in one thread (PublicLocalizer).)

 PublicLocalizer#addResource is too slow.
 

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 Currently FSDownload submission to the thread pool is done in 
 PublicLocalizer#addResource which is running in Dispatcher thread and 
 completed localization handling is done in PublicLocalizer#run which is 
 running in PublicLocalizer thread.
 Because PublicLocalizer#addResource is time consuming, the thread pool can't 
 be fully utilized. Instead of doing public resource localization in 
 parallel(multithreading), public resource localization is serialized most of 
 the time.
 Also there are two more benefits with this change:
 1. The Dispatcher thread won't be blocked by PublicLocalizer#addResource . 
 Dispatcher thread handles most of time critical events at Node manager.
 2. don't need synchronization on HashMap (pending).
 Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497354#comment-14497354
 ] 

Hadoop QA commented on YARN-3463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725714/YARN-3463.65.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1147 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7351//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7351//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7351//console

This message is automatically generated.

 Integrate OrderingPolicy Framework with CapacityScheduler
 -

 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
 YARN-3463.64.patch, YARN-3463.65.patch


 Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Attachment: YARN-3493.2.patch

uploaded a new patch

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
 yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 

[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497571#comment-14497571
 ] 

Hadoop QA commented on YARN-3495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725774/YARN-3495.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7355//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7355//console

This message is automatically generated.

 Confusing log generated by FairScheduler
 

 Key: YARN-3495
 URL: https://issues.apache.org/jira/browse/YARN-3495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3495.patch


 2015-04-16 12:03:48,531 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3496) Add a configuration to disable/enable storing localization state in NM StateStore

2015-04-15 Thread zhihai xu (JIRA)
zhihai xu created YARN-3496:
---

 Summary: Add a configuration to disable/enable storing 
localization state in NM StateStore
 Key: YARN-3496
 URL: https://issues.apache.org/jira/browse/YARN-3496
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu


Add a configuration to disable/enable storing localization state in NM 
StateStore.
Store Localization state in the levelDB may have some overhead, which may 
affect NM performance.
It would better to have a configuration to disable/enable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-15 Thread Jian He (JIRA)
Jian He created YARN-3494:
-

 Summary: Expose AM resource limit and user limit in QueueMetrics 
 Key: YARN-3494
 URL: https://issues.apache.org/jira/browse/YARN-3494
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


Now we have the AM resource limit and user limit shown on the web UI, it would 
be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.66.patch

Fix build warnings, the tests all pass on my box.

 Integrate OrderingPolicy Framework with CapacityScheduler
 -

 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
 YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch


 Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3491:

Description: 
Improve the public resource localization to do both FSDownload submission to 
the thread pool and completed localization handling in one thread 
(PublicLocalizer).
Currently FSDownload submission to the thread pool is done in 
PublicLocalizer#addResource which is running in Dispatcher thread and completed 
localization handling is done in PublicLocalizer#run which is running in 
PublicLocalizer thread.
Because PublicLocalizer#addResource is time consuming, the thread pool can't be 
fully utilized. Instead of doing public resource localization in 
parallel(multithreading), public resource localization is serialized most of 
the time.

Also there are two more benefits with this change:
1. The Dispatcher thread won't be blocked by above FSDownload submission. 
Dispatcher thread handles most of time critical events at Node manager.
2. don't need synchronization on HashMap (pending).
Because pending will be only accessed in PublicLocalizer thread.

  was:
Improve the public resource localization to do both FSDownload submission to 
the thread pool and completed localization handling in one thread 
(PublicLocalizer).
Currently FSDownload submission to the thread pool is done in 
PublicLocalizer#addResource which is running in Dispatcher thread and completed 
localization handling is done in PublicLocalizer#run which is running in 
PublicLocalizer thread.
Because FSDownload submission to the thread pool at the following code is time 
consuming, the thread pool can't be fully utilized. Instead of doing public 
resource localization in parallel(multithreading), public resource localization 
is serialized most of the time.
{code}
synchronized (pending) {
  pending.put(queue.submit(new FSDownload(lfs, null, conf,
  publicDirDestPath, resource, 
request.getContext().getStatCache())),
  request);
}
{code}

Also there are two more benefits with this change:
1. The Dispatcher thread won't be blocked by above FSDownload submission. 
Dispatcher thread handles most of time critical events at Node manager.
2. don't need synchronization on HashMap (pending).
Because pending will be only accessed in PublicLocalizer thread.


 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 -

 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical

 Improve the public resource localization to do both FSDownload submission to 
 the thread pool and completed localization handling in one thread 
 (PublicLocalizer).
 Currently FSDownload submission to the thread pool is done in 
 PublicLocalizer#addResource which is running in Dispatcher thread and 
 completed localization handling is done in PublicLocalizer#run which is 
 running in PublicLocalizer thread.
 Because PublicLocalizer#addResource is time consuming, the thread pool can't 
 be fully utilized. Instead of doing public resource localization in 
 parallel(multithreading), public resource localization is serialized most of 
 the time.
 Also there are two more benefits with this change:
 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
 Dispatcher thread handles most of time critical events at Node manager.
 2. don't need synchronization on HashMap (pending).
 Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497432#comment-14497432
 ] 

Hadoop QA commented on YARN-3437:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725758/YARN-3437.002.patch
  against trunk revision 1b89a3e.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7354//console

This message is automatically generated.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497500#comment-14497500
 ] 

Brahma Reddy Battula commented on YARN-3492:


[~kasha] Thanks for reporting this issue..

I took your mapred-site.xml and yarn-site.xml and started the 
pseudo-distributed cluster.. Containers are getting allocated and NM able 
connect to RM..

 *Please correct me If I am wrong..* 

 *{color:blue} Nodemanager Log{color}* 

{noformat}
2015-04-16 09:06:54,130 INFO 
org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
 Rolling master-key for container-tokens, got key with id -1430616116
2015-04-16 09:06:54,132 INFO 
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
Rolling master-key for container-tokens, got key with id -751280008
2015-04-16 09:06:54,133 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
with ResourceManager as host132:42289 with total resource of memory:8192, 
vCores:8
2015-04-16 09:06:54,133 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
ContainerManager to unblock new container-requests
2015-04-16 09:07:57,684 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1429155383347_0001_01 (auth:SIMPLE)
2015-04-16 09:07:57,772 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1429155383347_0001_01_01 by user hdfs
2015-04-16 09:07:57,797 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Creating a new application reference for app application_1429155383347_0001
2015-04-16 09:07:57,803 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hdfs IP=.132 
OPERATION=Start Container Request   TARGET=ContainerManageImpl  
RESULT=SUCCESS  APPID=application_1429155383347_0001
CONTAINERID=container_1429155383347_0001_01_01
{noformat}

did you enable any firewall or host getting changed..?

 AM fails to come up because RM and NM can't connect to each other
 -

 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker
 Attachments: mapred-site.xml, 
 yarn-kasha-nodemanager-kasha-mbp.local.log, 
 yarn-kasha-resourcemanager-kasha-mbp.local.log, yarn-site.xml


 Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
 container gets allocated, but doesn't get launched. The NM can't talk to the 
 RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-04-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497416#comment-14497416
 ] 

Rohith commented on YARN-3493:
--

bq. The same problem would occur
Just to clarify I am referring to InvalidResourceRequestException, not the RM 
start failure

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
 yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2696:
-
Attachment: YARN-2696.3.patch

Addressed all comments from [~jianhe] and fixed test failure in 
TestFifoScheduler, uploaded ver.3 patch.

 Queue sorting in CapacityScheduler should consider node label
 -

 Key: YARN-2696
 URL: https://issues.apache.org/jira/browse/YARN-2696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch


 In the past, when trying to allocate containers under a parent queue in 
 CapacityScheduler. The parent queue will choose child queues by the used 
 resource from smallest to largest. 
 Now we support node label in CapacityScheduler, we should also consider used 
 resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496981#comment-14496981
 ] 

Wangda Tan commented on YARN-3354:
--

Test failure is not related to the patch.

 Container should contains node-labels asked by original ResourceRequests
 

 Key: YARN-3354
 URL: https://issues.apache.org/jira/browse/YARN-3354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, capacityscheduler, nodemanager, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3354.1.patch, YARN-3354.2.patch


 We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
 resource requests can be allocated on labeled nodes which has idle resources.
 To make preemption work, we need know an allocated container's original node 
 label: when labeled resource requests comes back, we need kill non-labeled 
 containers running on labeled nodes.
 This requires add node-labels in Container, and also, NM need store this 
 information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >