date:20140909


 [ 
https://issues.apache.org/jira/browse/YARN-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2451.

Resolution: Invalid

This isn't the case anymore - it was either only on my local machine or got 
fixed in the move to git. 

HADOOP-10609 added .orig and .rej files to .gitignore. 

 Delete .orig files
 --

 Key: YARN-2451
 URL: https://issues.apache.org/jira/browse/YARN-2451
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 Looks like we checked in a few .orig files. We should delete them.
 {noformat}
 ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java.orig
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java.orig
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java.orig
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java.orig
 {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-09-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127086#comment-14127086
 ] 

Jason Lowe commented on YARN-2440:
--

bq. Specifically in the context of heterogeneous clusters where uniform % 
configurations can go really bad where the only resort will then be to do 
per-node configuration - not ideal.

Yes, I could see the heterogenous cluster being a case where specifying 
absolute instead of relative may be desirable.  My biggest concern is that it's 
confusing when trying to combine the absolute and relative concepts -- it's not 
obvious if one overrides the other or if one is relative to the other.

Part of my concern to keep this as simple as possible and the configuration 
burden to an absolute minimum is that I'm missing the real-world use case.  As 
I mentioned before, I think most users would rather not use the functionality 
proposed by this JIRA but instead setup peer cgroups for other systems and set 
their relative cgroup shares appropriately.  With this JIRA the CPUs could sit 
idle despite demand from YARN containers, while a peer cgroup setup allows CPU 
guarantees without idle CPUs if the demand is there.

 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1477) Improve AM web UI to avoid confusion about AM restart

2014-09-09 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1477:
--
Description: Improve AM web UI,  Add submitTime field to the AM's web 
services REST API, improve Elapsed:  row time computation, etc.  (was: 
Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to 
the AM's web services REST API)

 Improve AM web UI to avoid confusion about AM restart
 -

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.6.0


 Improve AM web UI,  Add submitTime field to the AM's web services REST API, 
 improve Elapsed:  row time computation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels


 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

Attached patch, this patch is based on trunk, but it cannot be compiled itself.
It will be hard to separate YARN-2500 with YARN-2496 and make each of them can 
be compiled. I split them just for easier reviewing.

This patch is based on YARN-2493 - YARN-2494  YARN-2500, you need apply these 
patches.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels


 [ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2500:
-
Attachment: YARN-2500.patch

Attached patch, this patch is based on trunk, but it cannot be compiled itself.
It will be hard to separate YARN-2500 with YARN-2496 and make each of them can 
be compiled. I split them just for easier reviewing.

This patch is based on YARN-2493 - YARN-2494  YARN-2496, you need apply these 
patches.

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2492) (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127141#comment-14127141
 ] 

Wangda Tan commented on YARN-2492:
--

Uploaded patches for YARN-2496 (changes to support node label in 
CapacityScheduler) and YARN-2500 (misc changes to make RM support labels)

 (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests 
 

 Key: YARN-2492
 URL: https://issues.apache.org/jira/browse/YARN-2492
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, client, resourcemanager
Reporter: Wangda Tan

 Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and 
 track sub tasks and attach split patches for YARN-796.
 *Let's still keep over-all discussions on YARN-796.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels


[ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127167#comment-14127167
 ] 

Hadoop QA commented on YARN-2500:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667432/YARN-2500.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4856//console

This message is automatically generated.

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels


[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127168#comment-14127168
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667431/YARN-2496.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4855//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time


 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Priority: Minor  (was: Major)

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time


 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Attachment: YARN-2526-1.patch

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-09 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127201#comment-14127201
 ] 

Xuan Gong commented on YARN-2459:
-

+1 LGTM

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time


 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Attachment: YARN-2526-1.patch

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch, YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time


 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Attachment: (was: YARN-2526-1.patch)

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--
Attachment: YARN-2033.10.patch

Create a new patch, which improve the check of backward compatibility and fix a 
missing break in the switch block.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.2.patch, YARN-2033.3.patch, 
 YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.8.patch, YARN-2033.9.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, 
 YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time


[ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127280#comment-14127280
 ] 

Hadoop QA commented on YARN-2526:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667440/YARN-2526-1.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4857//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4857//console

This message is automatically generated.

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2525) yarn logs command gives error on trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2525:
---
Component/s: scripts

 yarn logs command gives error on trunk
 --

 Key: YARN-2525
 URL: https://issues.apache.org/jira/browse/YARN-2525
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Prakash Ramachandran
Priority: Minor
  Labels: newbie

 yarn logs command (trunk branch) gives an error
 Error: Could not find or load main class 
 org.apache.hadoop.yarn.logaggregation.LogDumper
 instead the class should be org.apache.hadoop.yarn.client.cli.LogsCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2525) yarn logs command gives error on trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2525:
---
Labels: newbie  (was: )

 yarn logs command gives error on trunk
 --

 Key: YARN-2525
 URL: https://issues.apache.org/jira/browse/YARN-2525
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Prakash Ramachandran
Priority: Minor
  Labels: newbie

 yarn logs command (trunk branch) gives an error
 Error: Could not find or load main class 
 org.apache.hadoop.yarn.logaggregation.LogDumper
 instead the class should be org.apache.hadoop.yarn.client.cli.LogsCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:

Attachment: YARN-1458.006.patch

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-09 Thread Craig Welch (JIRA)

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127329#comment-14127329
]

Craig Welch commented on YARN-796:
--

This is a bit of a detail, but the current version of the code lowercases the
nodelabels rather than respecting the given name. I don't believe this is what
we want. The requirements do request case-insensitive comparison, but that is
not the same as changing the case. There are a few options which come to mind:

1. Switch to case insensitive Set's and Maps for managing the labels - TreeSet
and TreeMap can be configured to operate in a case-insensitive fashion, I
expect they would be OK to use for nodelables.
2. Gate label names on the way in to force consistent case while maintaining
case - a Map with lc key and original case value could be used to keep all
labels for a given set of letters a consistent case (the original)
3. Drop the requirement for case insensitivity - I'm not sure of the
reasoning, I assume it is to prevent mis-types, but I'm not sure it's really so
important, and there are still many opportunities for mistyping labels, I'm not
sure if protecting against this one case is worth the implementation
cost/complexity or the loss of the original case as specified by the user.

I suggest 3, FWIW

Allow for (admin) labels on nodes and resource-requests
---

Key: YARN-796
URL: https://issues.apache.org/jira/browse/YARN-796
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
Attachments: LabelBasedScheduling.pdf,
Node-labels-Requirements-Design-doc-V1.pdf,
Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf,
YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.demo.patch.1,
YARN-796.patch, YARN-796.patch4

It will be useful for admins to specify labels for nodes. Examples of labels
are OS, processor architecture etc.
We should expose these labels and allow applications to specify labels on
resource-requests.
Obviously we need to support admin operations on adding/removing node labels.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127332#comment-14127332
 ] 

zhihai xu commented on YARN-1458:
-

I uploaded a patch YARN-1458.006.patch for the first approach:
This patch compare with previous result in the loop to fix the zero weight 
with non-zero minShare issue and calculate the start point for rMax using the 
minimum ratio of minShare/weight to fix all queues have none zero minShare 
issue.
Either approach is ok for me. but the second approach is a little simpler and 
faster than the first approach.


 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps

2014-09-09 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127353#comment-14127353
 ] 

Xuan Gong commented on YARN-2456:
-

I think that both of ways (sort the ApplicationStates based on ApplicationId or 
ApplicationState's submitTime) are fine. Since all processes are asynchronous, 
the corner case is still exist. 
[~jianhe] What do you think ?

 Possible deadlock in CapacityScheduler when RM is recovering apps
 -

 Key: YARN-2456
 URL: https://issues.apache.org/jira/browse/YARN-2456
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2456.1.patch


 Consider this scenario:
 1. RM is configured with a single queue and only one application can be 
 active at a time.
 2. Submit App1 which uses up the queue's whole capacity
 3. Submit App2 which remains pending.
 4. Restart RM.
 5. App2 is recovered before App1, so App2 is added to the activeApplications 
 list. Now App1 remains pending (because of max-active-app limit)
 6. All containers of App1 are now recovered when NM registers, and use up the 
 whole queue capacity again.
 7. Since the queue is full, App2 cannot proceed to allocate AM container.
 8. In the meanwhile, App1 cannot proceed to become active because of the 
 max-active-app limit 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127374#comment-14127374
 ] 

Hadoop QA commented on YARN-1458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667451/YARN-1458.006.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4859//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4859//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127384#comment-14127384
 ] 

Hadoop QA commented on YARN-2033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667445/YARN-2033.10.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4858//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4858//console

This message is automatically generated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.2.patch, YARN-2033.3.patch, 
 YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.8.patch, YARN-2033.9.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, 
 YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-09 Thread bc Wong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127403#comment-14127403
]

bc Wong commented on YARN-1530:
---

bq. The current writing channel allows the data to be available on the timeline
server immediately

Let's have reliability before speed. I think one of the requirement of ATS is:
*The channel for writing events should be reliable.*

I'm using *reliable* here in a strong sense, not the TCP-best-effort style
reliability. HDFS is reliable. Kafka is reliable. (They are also scalable and
robust.) A normal RPC connection is not. I don't want the ATS to be able to
slow down my writes, and therefore, my applications, at all. For example, an
ATS failover shouldn't pause all my applications for N seconds. A direct RPC to
the ATS for writing seems a poor choice in general.

Yes, you could make a distributed reliable scalable ATS service to accept
writing events. But that seems a lot of work, while we can leverage existing
technologies.

If the channel itself is pluggable, then we have lots of options. Kafka is a
very good choice, for sites that already deploy Kafka and know how to operate
it. Using HDFS as a channel is also a good default implementation, for people
already know how to scale and manage HDFS. Embedding a Kafka broker with each
ATS daemon is also an option, if we're ok with that dependency.

[Umbrella] Store, manage and serve per-framework application-timeline data
--

Key: YARN-1530
URL: https://issues.apache.org/jira/browse/YARN-1530
Project: Hadoop YARN
Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Attachments: ATS-Write-Pipeline-Design-Proposal.pdf,
ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf,
application timeline design-20140116.pdf, application timeline
design-20140130.pdf, application timeline design-20140210.pdf

This is a sibling JIRA for YARN-321.
Today, each application/framework has to do store, and serve per-framework
data all by itself as YARN doesn't have a common solution. This JIRA attempts
to solve the storage, management and serving of per-framework data from
various applications, both running and finished. The aim is to change YARN to
collect and store data in a generic manner with plugin points for frameworks
to do their own thing w.r.t interpretation and serving.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-09-09 Thread Varun Vasudev (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Vasudev updated YARN-2440:

Attachment: apache-yarn-2440.5.patch

Uploaded new patch to address Vinod's concerns.

bq.containers-limit-cpu-percentage -
yarn.nodemanager.resource.percentage-cpu-limit to be consistent? Similarly
NM_CONTAINERS_CPU_PERC? I don't like the tag 'resource', it should have been
'resources' but it is what it is.

I'm worried that calling it that will lead users to think it's a percentage of
the vcores that they specify. In the patch I've changed it to
yarn.nodemanager.resource.percentage-physical-cpu-limit but if you or Jason
feel strongly about it, I can change it to
yarn.nodemanager.resource.percentage-cpu-limit.

bq.You still have refs to YarnConfiguration.NM_CONTAINERS_CPU_ABSOLUTE in
the patch. Similarly the javadoc in NodeManagerHardwareUtils needs to be
updated if we are not adding the absolute cpu config. It should no longer refer
to number of cores that should be used for YARN containers

Fixed.

bq.TestCgroupsLCEResourcesHandler: You can use mockito if you only want to
override num-processors in TestResourceCalculatorPlugin. Similarly in
TestNodeManagerHardwareUtils.

Switched to mockito.

bq.The tests may fail on a machine with 4 cores?
Don't think so. The tests mock the getNumProcessors() function so we should be
fine.

Cgroups should allow YARN containers to be limited to allocated cores
-

Key: YARN-2440
URL: https://issues.apache.org/jira/browse/YARN-2440
Project: Hadoop YARN
Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch,
apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch,
apache-yarn-2440.5.patch, screenshot-current-implementation.jpg

The current cgroups implementation does not limit YARN containers to the
cores allocated in yarn-site.xml.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127467#comment-14127467
 ] 

Hadoop QA commented on YARN-2440:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12667461/apache-yarn-2440.5.patch
  against trunk revision 2749fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4860//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4860//console

This message is automatically generated.

 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 apache-yarn-2440.5.patch, screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2527) NPE in ApplicationACLsManager

Benoy Antony created YARN-2527:
--

 Summary: NPE in ApplicationACLsManager
 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony


NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
The relevant stacktrace snippet from the ResourceManager logs is as below
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127470#comment-14127470
 ] 

Benoy Antony commented on YARN-2527:


working on a patch for this issue.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony

 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-154) Create Yarn trunk and commit jobs


 [ 
https://issues.apache.org/jira/browse/YARN-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-154:
--
Fix Version/s: (was: 3.0.0)
   2.0.2-alpha
   0.23.5

 Create Yarn trunk and commit jobs
 -

 Key: YARN-154
 URL: https://issues.apache.org/jira/browse/YARN-154
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Eli Collins
Assignee: Robert Joseph Evans
 Fix For: 2.0.2-alpha, 0.23.5


 Yarn should have Hadoop-Yarn-trunk and Hadoop-Yarn-trunk-Commit jenkins jobs 
 that correspond to the Common, HDFS, and MR ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

[
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Subramaniam Krishnan updated YARN-2080:
---
Attachment: YARN-2080.patch

Thanks [~vinodkv] for reviewing the patch. I am uploading a new patch that has
includes your feedback:
* Renamed all Yarn config variables as you suggested. I prefer using the
standalone configs as it gives us more flexibility.
* Removed duplicate logging in _ClientRMService_
_ReservationInputValidator_. Consistenly uses RMAuditLogger throughout.
* Fixes in AbstractReservationSystem as you suggested.
* Updated stale references to queues in Javadocs of
_YarnClient.submitReservation()_
* _TestYarnClient_ _TestClientRMService_ use newInstance instead of PBImpls
* Renamed _ReservationRequest.setLeaseDuration()_ was renamed to be simply
_setDuration()_
* Moved _CapacitySchedulerConfiguration_ to YARN-1711

bq. ReservationInputValidator: Deleting a request shouldn't need
validateReservationUpdateRequest-validateReservationDefinition. We only need
the ID validation

That's exactly what's being done. ReservationDefinitions are validated only for
submission/update.

bq. checkReservationACLs: Today anyone who can submit applications can also
submit reservations. We may want to separate them, if you agree, I'll file a
ticket for future separation of these ACLs.

I agree. I have a set of follow up enhancement JIRAs to YARN-1051 in mind one
of which was exactly to consider separation of ACLs as you pointed out.

Admission Control: Integrate Reservation subsystem with ResourceManager
---

Key: YARN-2080
URL: https://issues.apache.org/jira/browse/YARN-2080
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch,
YARN-2080.patch

This JIRA tracks the integration of Reservation subsystem data structures
introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring
of YARN-1051.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1215) Yarn URL should include userinfo


 [ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-1215:
---
Fix Version/s: (was: 3.0.0)

 Yarn URL should include userinfo
 

 Key: YARN-1215
 URL: https://issues.apache.org/jira/browse/YARN-1215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 2.2.0

 Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch


 In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
 userinfo as part of the URL. When converting a {{java.net.URI}} object into 
 the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
 set uri host as the url host. If the uri has a userinfo part, the userinfo is 
 discarded. This will lead to information loss if the original uri has the 
 userinfo, e.g. foo://username:passw...@example.com will be converted to 
 foo://example.com and username/password information is lost during the 
 conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-794) YarnClientImpl.submitApplication() to add a timeout


 [ 
https://issues.apache.org/jira/browse/YARN-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-794:
--
Fix Version/s: (was: 2.1.0-beta)
   (was: 3.0.0)

 YarnClientImpl.submitApplication() to add a timeout
 ---

 Key: YARN-794
 URL: https://issues.apache.org/jira/browse/YARN-794
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Steve Loughran
Priority: Minor

 {{YarnClientImpl.submitApplication()}} can spin forever waiting for the RM to 
 accept the submission, ignoring interrupts on the sleep.
 # A timeout allows client applications to recognise and react to a failure of 
 the RM to accept work in a timely manner.
 # The interrupt exception could be converted to an {{InterruptedIOException}} 
 and raised within the current method signature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager


[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127493#comment-14127493
 ] 

Hadoop QA commented on YARN-2080:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667468/YARN-2080.patch
  against trunk revision 2749fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4861//console

This message is automatically generated.

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Jonathan Eagles (JIRA)

Jonathan Eagles created YARN-2528:
-

 Summary: Cross Origin Filter Http response split vulnerability 
protection rejects valid origins
 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


URLEncoding is too strong of a protection for HTTP Response Split Vulnerability 
protection and major browser reject the encoded Origin. An adequate protection 
is simply to remove all CRs LFs as in the case of PHP's header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2528:
--
Attachment: YARN-2528-v1.patch

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager


 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Description: 
NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
The relevant stacktrace snippet from the ResourceManager logs is as below
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{code}

This issue was reported by [~miguenther].

  was:
NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
The relevant stacktrace snippet from the ResourceManager logs is as below
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{code}


 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony

 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins


[ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127575#comment-14127575
 ] 

Hadoop QA commented on YARN-2528:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667480/YARN-2528-v1.patch
  against trunk revision 2749fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4862//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4862//console

This message is automatically generated.

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127592#comment-14127592
 ] 

Jonathan Eagles commented on YARN-2528:
---

[~zjshen], sorry to bother you again. Found another bug while working on 
getting the Tez UI running in a hosted environment. Can you give a review?

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time


[ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127647#comment-14127647
 ] 

Karthik Kambatla commented on YARN-2526:


Thanks for reporting and fixing this, Wei.

+1. Committing this. 

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2526) SLS can deadlock when all the threads are taken by AMSimulators


 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2526:
---
  Component/s: scheduler-load-simulator
 Priority: Critical  (was: Minor)
 Target Version/s: 2.6.0
Affects Version/s: 2.5.1

 SLS can deadlock when all the threads are taken by AMSimulators
 ---

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 2.5.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2526) SLS can deadlock when all the threads are taken by AMSimulators


 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2526:
---
Summary: SLS can deadlock when all the threads are taken by AMSimulators  
(was: Scheduler Load Simulator may enter deadlock if lots of applications 
submitted to the RM at the same time)

 SLS can deadlock when all the threads are taken by AMSimulators
 ---

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 2.5.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-09 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-415:

Attachment: YARN-415.201409092204.txt

[~jianhe], thank you very much for your time in reviewing this patch and your 
helpful suggestions.

{quote}
seems we don't need this check, because the returned 
ApplicationResourceUsageReport for non-active attempt is anyways null.
{code}
// Only add in the running containers if this is the active attempt.
RMAppAttempt currentAttempt = rmContext.getRMApps()
   .get(attemptId.getApplicationId()).getCurrentAppAttempt();
if (currentAttempt != null 
currentAttempt.getAppAttemptId().equals(attemptId)) {
{code}
{quote}
You are correct. The above check for {{currentAttempt != null}} is not 
necessary.
With this new patch, I have upmerged again (since it wasn't applying cleanly) 
and removed this check.

[~kkambatl], I would also like to thank you for your help on this patch. Were 
you okay with the changes I made in response to your suggestions? It would be 
great if we could move this patch over the goal line soon.

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127666#comment-14127666
]

Karthik Kambatla edited comment on YARN-415 at 9/9/14 10:20 PM:

Eric - I haven't had a chance to take a look at the latest patch. I trust Jian
and you to make sure the concerns are addressed, the suggestions themselves
were straight-forward. Thanks for staying patient through this long-drawn JIRA.

was (Author: kkambatl):
Eric - I haven't had a chance to take a look at the latest patch. I trust Jian
and you to make sure the concerns are addressed, the suggestions themselves
were straight-forward.

Capture aggregate memory allocation at the app-level for chargeback
---

Key: YARN-415
URL: https://issues.apache.org/jira/browse/YARN-415
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
Attachments: YARN-415--n10.patch, YARN-415--n2.patch,
YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch,
YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch,
YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt,
YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt,
YARN-415.201407172144.txt, YARN-415.201407232237.txt,
YARN-415.201407242148.txt, YARN-415.201407281816.txt,
YARN-415.201408062232.txt, YARN-415.201408080204.txt,
YARN-415.201408092006.txt, YARN-415.201408132109.txt,
YARN-415.201408150030.txt, YARN-415.201408181938.txt,
YARN-415.201408181938.txt, YARN-415.201408212033.txt,
YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch

For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage. To start out, I'd like to
get the memory utilization of an application. The unit should be MB-seconds
or something similar and, from a chargeback perspective, the memory amount
should be the memory reserved for the application, as even if the app didn't
use all that memory, no one else was able to use it.
(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n
* lifetime of container n)
It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
This new metric should be available both through the RM UI and RM Web
Services REST API.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127666#comment-14127666
 ] 

Karthik Kambatla commented on YARN-415:
---

Eric - I haven't had a chance to take a look at the latest patch. I trust Jian 
and you to make sure the concerns are addressed, the suggestions themselves 
were straight-forward. 

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager


 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: YARN-2527.patch

Attaching a patch which checks if the map of ACLs foran application is null. 
If null, it uses the default ACL.

A new test case is added which checks the normal case as well as the case when 
the ACLis not set for an application.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127755#comment-14127755
 ] 

Benoy Antony commented on YARN-2527:


[~vinodkv], could you please review this jira ?
Could you also make me a Yarn contributor so that I can assign the jira to me?


 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1458:
---
Attachment: yarn-1458-7.patch

Thanks Zhihai. 

I see the advantage of the second approach. My main concern is readability of 
the approach. I have taken a stab at making it more readable/maintainable 
through only cosmetic changes. Can you please take a look and see if these 
cosmetic changes make sense to you. 

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1412#comment-1412
]

Hadoop QA commented on YARN-415:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12667510/YARN-415.201409092204.txt
against trunk revision 28d99db.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 12 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4863//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4863//console

This message is automatically generated.

Capture aggregate memory allocation at the app-level for chargeback
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127795#comment-14127795
 ] 

Hadoop QA commented on YARN-2527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667532/YARN-2527.patch
  against trunk revision 28d99db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4864//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4864//console

This message is automatically generated.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127854#comment-14127854
 ] 

Hadoop QA commented on YARN-1458:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667535/yarn-1458-7.patch
  against trunk revision 28d99db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4865//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4865//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at

[jira] [Updated] (YARN-1712) Admission Control: plan follower


 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1712:
---
Attachment: YARN-1712.3.patch

Thanks [~jianhe] for your detailed feedback. I am attaching a patch with the 
following updates:
  * Made move apps logic synchronous and move is to defReservationQueue 
(renamed)
  * Removed the synchronized on scheduler as individual calls are already 
synchronized
  * Fixed comment formatting and variable names
  * Created a common method to calculate lhsRes and rhsRes
  * Optimized the loop as suggested

Some clarifications:
  * Exceptions are suppressed deliberately as PlanFollower is a background 
timer thread and we don't want it to exit
  * _plan.getReservationsAtTime(now)_ is used by others like Replanners. We 
need the reservations and not just the names even in PlanFollower so leaving it 
as is
 * Tried moving the default queue creating to when PlanQueue is initialized in 
CapacityScheduler but it was getting overly complex mainly due to the relaxed 
constraint of child capacities =100% for PlanQueues. This is just an 
additional hashmap lookup with the code being much cleaner so not moving it for 
now. If it is still a concern, I can add a flag to Plan and check that instead 
of CapacityScheduler#getQueue

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: YARN-1712.3.patch

Updated patch to include *CapacitySchedulerConfiguration* based on by 
[~vinodkv]'s [suggestion | 
https://issues.apache.org/jira/browse/YARN-2080?focusedCommentId=14125994] as 
the _majority_ of the configurations or for enforcement policies

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.patch, YARN-1712.3.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: (was: YARN-1712.3.patch)

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: YARN-1711.2.patch

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1712) Admission Control: plan follower

[
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127913#comment-14127913
]

Hadoop QA commented on YARN-1712:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12667564/YARN-1712.3.patch
against trunk revision 0de563a.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4866//console

This message is automatically generated.

Admission Control: plan follower

Key: YARN-1712
URL: https://issues.apache.org/jira/browse/YARN-1712
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
Labels: reservations, scheduler
Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch,
YARN-1712.patch

This JIRA tracks a thread that continuously propagates the current state of
an inventory subsystem to the scheduler. As the inventory subsystem store the
plan of how the resources should be subdivided, the work we propose in this
JIRA realizes such plan by dynamically instructing the CapacityScheduler to
add/remove/resize queues to follow the plan.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:

Attachment: yarn-1458-8.patch

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127929#comment-14127929
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667567/YARN-1712.3.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4867//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127939#comment-14127939
 ] 

zhihai xu commented on YARN-1458:
-

Hi [~kasha], Your change makes the code much easier to read and maintain.
I uploaded a new patch yarn-1458-8.patch with two minor changes based on your 
patch:
use Math.max instead of Math.abs and check schedulables.isEmpty() after 
handleFixedFairShares.
Please review it.
thanks


 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127941#comment-14127941
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667570/YARN-1711.2.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4868//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127967#comment-14127967
 ] 

Hadoop QA commented on YARN-1458:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667572/yarn-1458-8.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4869//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4869//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at

[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps

[
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127987#comment-14127987
]

Wangda Tan commented on YARN-2456:
--

I think there're too many factors will affect active applications list after
application submission. IMHO, recovering application by time of creation or
submission is not a big deal, keep it simple and straight-forward should be
more important, I prefer Jian's method. +1 for the patch

Thanks,

Possible deadlock in CapacityScheduler when RM is recovering apps
-

Key: YARN-2456
URL: https://issues.apache.org/jira/browse/YARN-2456
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-2456.1.patch

Consider this scenario:
1. RM is configured with a single queue and only one application can be
active at a time.
2. Submit App1 which uses up the queue's whole capacity
3. Submit App2 which remains pending.
4. Restart RM.
5. App2 is recovered before App1, so App2 is added to the activeApplications
list. Now App1 remains pending (because of max-active-app limit)
6. All containers of App1 are now recovered when NM registers, and use up the
whole queue capacity again.
7. Since the queue is full, App2 cannot proceed to allocate AM container.
8. In the meanwhile, App1 cannot proceed to become active because of the
max-active-app limit

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps


[ 
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127989#comment-14127989
 ] 

Wangda Tan commented on YARN-2456:
--

In addition, I suggest to change the {{deadlock}} of the title to be 
{{livelock}}, because no thread is locked here, it just a weird state 
preventing the state to make progress, instead of lock. See: 
http://en.wikipedia.org/wiki/Deadlock#Livelock

Wangda

 Possible deadlock in CapacityScheduler when RM is recovering apps
 -

 Key: YARN-2456
 URL: https://issues.apache.org/jira/browse/YARN-2456
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2456.1.patch


 Consider this scenario:
 1. RM is configured with a single queue and only one application can be 
 active at a time.
 2. Submit App1 which uses up the queue's whole capacity
 3. Submit App2 which remains pending.
 4. Restart RM.
 5. App2 is recovered before App1, so App2 is added to the activeApplications 
 list. Now App1 remains pending (because of max-active-app limit)
 6. All containers of App1 are now recovered when NM registers, and use up the 
 whole queue capacity again.
 7. Since the queue is full, App2 cannot proceed to allocate AM container.
 8. In the meanwhile, App1 cannot proceed to become active because of the 
 max-active-app limit 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128005#comment-14128005
 ] 

Benoy Antony commented on YARN-2527:


[~zjshen], could you please review this jira ?
Could you also make me a Yarn contributor so that I can assign the jira to me?

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128011#comment-14128011
 ] 

Wangda Tan commented on YARN-796:
-

Hi [~cwelch] and [~aw],
I agree with #3 as well, since the original starting point is to avoid 
case-typo from users. But refer to other existing configs of YARN, like queue 
name of CS, different case of queue name means different queue. I prefer to 
drop the requirement if there's no strong opinion to do that.

Thanks,
Wangda 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1712) Admission Control: plan follower


[ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128023#comment-14128023
 ] 

Wangda Tan commented on YARN-1712:
--

Hi [~subru], 
I've just taken a look at your latest patch, now the code is much cleaner than 
before, thanks!

I'm not quite understand what you said,
bq. Tried moving the default queue creating to when PlanQueue is initialized in 
CapacityScheduler but it was getting overly complex mainly due to the relaxed 
constraint of child capacities =100% for PlanQueues. This is just an 
additional hashmap lookup with the code being much cleaner so not moving it for 
now. If it is still a concern, I can add a flag to Plan and check that instead 
of CapacityScheduler#getQueue
Could you please elaborate? 

And in addition, a very minor comments is, could you put LOG.debug within block 
like LOG.debug in other modules?
{code}
if (LOG.isDebugEnabled()) {
 // ...
}
{code}

Thanks,
Wangda

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins


[ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128026#comment-14128026
 ] 

Zhijie Shen commented on YARN-2528:
---

[~jeagles], no problem.  I compared our CrossOriginFilter with the one in 
Jetty. That one seems not to do any post-process for the string obtained from 
ORIGIN header. What's the reason that we need for our CrossOriginFilter? 
According to test case, you want to avoid the issue that the string contains 
the other header, don't you? HttpServletResponse.getHeader doesn't handle 
header splitting properly?

BTW, it seems that ours' only allows one origin in the request header, but 
Jetty's allows multiple one. And I find a specification: 
http://tools.ietf.org/html/draft-abarth-origin-09, which tells that ORIGIN can 
be a list. Any thought?

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager


 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2527:
--
Assignee: Benoy Antony

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128038#comment-14128038
 ] 

Zhijie Shen commented on YARN-2527:
---

[~benoyantony], I'va added you as a YARN contributor, and assign this Jira to 
you.

W.R.T the NPE, did you have a chance to see why NPE will happen? For each 
submitted app, its acls seem to be added into ApplicationACLsManager. 
ContainerLaunchContext#getApplicationACLs should return a empty acls map, if 
user doesn't specify anything, right?

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk