[jira] [Updated] (YARN-2458) Add file handling features to the Windows Secure Container Executor LRPC service

2014-09-09 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2458:
---
Attachment: YARN-2458.2.patch

A complete implementation that delegates critical file handling (mkdirs) to the 
privileged service. 

 Add file handling features to the Windows Secure Container Executor LRPC 
 service
 

 Key: YARN-2458
 URL: https://issues.apache.org/jira/browse/YARN-2458
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2458.1.patch, YARN-2458.2.patch


 In the WSCE design the nodemanager needs to do certain privileged operations 
 like change file ownership to arbitrary users or delete files owned by the 
 task container user after completion of the task. As we want to remove the 
 Administrator privilege  requirement from the nodemanager service, we have to 
 move these operations into the privileged LRPC helper service. 
 Extend the RPC interface to contain methods for change file ownership and 
 manipulate files, add JNI client side and implement the server side. This 
 will piggyback on the existing LRPC service so is not much infrastructure to 
 add (run as service, RPC init, authentictaion and authorization are already 
 solved). It just needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127018#comment-14127018
 ] 

Eric Payne commented on YARN-2056:
--

[~leftnoteasy], thank you very much for your review comments. I appreciate it.

Regarding the test for appA running on queueB:
The following assertion tests that appA is preempted, but the preemption 
calculations are done per queue.
{code}
+verify(mDisp, times(10)).handle(argThat(new IsPreemptionRequestFor(appA)));
{code}
So, I added the following assertion to make the visual connection between appA 
and queueB:
{code}
+assertTrue(appA should be running on queueB,
+mCS.getAppsInQueue(queueB).contains(expectedAttemptOnQueueB));
{code}
So, the purpose is not really to test that the mockQueue/mockApp worked 
correctly, but to make it obvious that the preemption policy for queueB is 
being exercised. If you think it's not necessary, I will remove it, but I do 
like the link.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps

2014-09-09 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127034#comment-14127034
 ] 

Anubhav Dhoot commented on YARN-2456:
-

Can we sort the ApplicationStates based on ApplicationState's submitTime or 
startTime fields when we recover?

 Possible deadlock in CapacityScheduler when RM is recovering apps
 -

 Key: YARN-2456
 URL: https://issues.apache.org/jira/browse/YARN-2456
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2456.1.patch


 Consider this scenario:
 1. RM is configured with a single queue and only one application can be 
 active at a time.
 2. Submit App1 which uses up the queue's whole capacity
 3. Submit App2 which remains pending.
 4. Restart RM.
 5. App2 is recovered before App1, so App2 is added to the activeApplications 
 list. Now App1 remains pending (because of max-active-app limit)
 6. All containers of App1 are now recovered when NM registers, and use up the 
 whole queue capacity again.
 7. Since the queue is full, App2 cannot proceed to allocate AM container.
 8. In the meanwhile, App1 cannot proceed to become active because of the 
 max-active-app limit 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2451) Delete .orig files

2014-09-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2451.

Resolution: Invalid

This isn't the case anymore - it was either only on my local machine or got 
fixed in the move to git. 

HADOOP-10609 added .orig and .rej files to .gitignore. 

 Delete .orig files
 --

 Key: YARN-2451
 URL: https://issues.apache.org/jira/browse/YARN-2451
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 Looks like we checked in a few .orig files. We should delete them.
 {noformat}
 ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java.orig
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java.orig
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java.orig
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java.orig
 {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-09-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127086#comment-14127086
 ] 

Jason Lowe commented on YARN-2440:
--

bq. Specifically in the context of heterogeneous clusters where uniform % 
configurations can go really bad where the only resort will then be to do 
per-node configuration - not ideal.

Yes, I could see the heterogenous cluster being a case where specifying 
absolute instead of relative may be desirable.  My biggest concern is that it's 
confusing when trying to combine the absolute and relative concepts -- it's not 
obvious if one overrides the other or if one is relative to the other.

Part of my concern to keep this as simple as possible and the configuration 
burden to an absolute minimum is that I'm missing the real-world use case.  As 
I mentioned before, I think most users would rather not use the functionality 
proposed by this JIRA but instead setup peer cgroups for other systems and set 
their relative cgroup shares appropriately.  With this JIRA the CPUs could sit 
idle despite demand from YARN containers, while a peer cgroup setup allows CPU 
guarantees without idle CPUs if the demand is there.

 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1477) Improve AM web UI to avoid confusion about AM restart

2014-09-09 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1477:
--
Description: Improve AM web UI,  Add submitTime field to the AM's web 
services REST API, improve Elapsed:  row time computation, etc.  (was: 
Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to 
the AM's web services REST API)

 Improve AM web UI to avoid confusion about AM restart
 -

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.6.0


 Improve AM web UI,  Add submitTime field to the AM's web services REST API, 
 improve Elapsed:  row time computation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

Attached patch, this patch is based on trunk, but it cannot be compiled itself.
It will be hard to separate YARN-2500 with YARN-2496 and make each of them can 
be compiled. I split them just for easier reviewing.

This patch is based on YARN-2493 - YARN-2494  YARN-2500, you need apply these 
patches.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-09-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2500:
-
Attachment: YARN-2500.patch

Attached patch, this patch is based on trunk, but it cannot be compiled itself.
It will be hard to separate YARN-2500 with YARN-2496 and make each of them can 
be compiled. I split them just for easier reviewing.

This patch is based on YARN-2493 - YARN-2494  YARN-2496, you need apply these 
patches.

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2492) (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127141#comment-14127141
 ] 

Wangda Tan commented on YARN-2492:
--

Uploaded patches for YARN-2496 (changes to support node label in 
CapacityScheduler) and YARN-2500 (misc changes to make RM support labels)

 (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests 
 

 Key: YARN-2492
 URL: https://issues.apache.org/jira/browse/YARN-2492
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, client, resourcemanager
Reporter: Wangda Tan

 Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and 
 track sub tasks and attach split patches for YARN-796.
 *Let's still keep over-all discussions on YARN-796.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127167#comment-14127167
 ] 

Hadoop QA commented on YARN-2500:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667432/YARN-2500.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4856//console

This message is automatically generated.

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127168#comment-14127168
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667431/YARN-2496.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4855//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time

2014-09-09 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Priority: Minor  (was: Major)

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time

2014-09-09 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Attachment: YARN-2526-1.patch

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-09 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127201#comment-14127201
 ] 

Xuan Gong commented on YARN-2459:
-

+1 LGTM

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time

2014-09-09 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Attachment: YARN-2526-1.patch

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch, YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time

2014-09-09 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2526:
--
Attachment: (was: YARN-2526-1.patch)

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-09 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--
Attachment: YARN-2033.10.patch

Create a new patch, which improve the check of backward compatibility and fix a 
missing break in the switch block.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.2.patch, YARN-2033.3.patch, 
 YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.8.patch, YARN-2033.9.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, 
 YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127280#comment-14127280
 ] 

Hadoop QA commented on YARN-2526:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667440/YARN-2526-1.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4857//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4857//console

This message is automatically generated.

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2525) yarn logs command gives error on trunk

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2525:
---
Component/s: scripts

 yarn logs command gives error on trunk
 --

 Key: YARN-2525
 URL: https://issues.apache.org/jira/browse/YARN-2525
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Prakash Ramachandran
Priority: Minor
  Labels: newbie

 yarn logs command (trunk branch) gives an error
 Error: Could not find or load main class 
 org.apache.hadoop.yarn.logaggregation.LogDumper
 instead the class should be org.apache.hadoop.yarn.client.cli.LogsCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2525) yarn logs command gives error on trunk

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2525:
---
Labels: newbie  (was: )

 yarn logs command gives error on trunk
 --

 Key: YARN-2525
 URL: https://issues.apache.org/jira/browse/YARN-2525
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Prakash Ramachandran
Priority: Minor
  Labels: newbie

 yarn logs command (trunk branch) gives an error
 Error: Could not find or load main class 
 org.apache.hadoop.yarn.logaggregation.LogDumper
 instead the class should be org.apache.hadoop.yarn.client.cli.LogsCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:

Attachment: YARN-1458.006.patch

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-09 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127329#comment-14127329
 ] 

Craig Welch commented on YARN-796:
--

This is a bit of a detail, but the current version of the code lowercases the 
nodelabels rather than respecting the given name.  I don't believe this is what 
we want.  The requirements do request case-insensitive comparison, but that is 
not the same as changing the case.  There are a few options which come to mind:

1. Switch to case insensitive Set's and Maps for managing the labels - TreeSet 
and TreeMap can be configured to operate in a case-insensitive fashion, I 
expect they would be OK to use for nodelables.
2. Gate label names on the way in to force consistent case while maintaining 
case - a Map with lc key and original case value could be used to keep all 
labels for a given set of letters a consistent case (the original)
3.  Drop the requirement for case insensitivity - I'm not sure of the 
reasoning, I assume it is to prevent mis-types, but I'm not sure it's really so 
important, and there are still many opportunities for mistyping labels, I'm not 
sure if protecting against this one case is worth the implementation 
cost/complexity or the loss of the original case as specified by the user. 

I suggest 3, FWIW

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127332#comment-14127332
 ] 

zhihai xu commented on YARN-1458:
-

I uploaded a patch YARN-1458.006.patch for the first approach:
This patch compare with previous result in the loop to fix the zero weight 
with non-zero minShare issue and calculate the start point for rMax using the 
minimum ratio of minShare/weight to fix all queues have none zero minShare 
issue.
Either approach is ok for me. but the second approach is a little simpler and 
faster than the first approach.


 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps

2014-09-09 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127353#comment-14127353
 ] 

Xuan Gong commented on YARN-2456:
-

I think that both of ways (sort the ApplicationStates based on ApplicationId or 
ApplicationState's submitTime) are fine. Since all processes are asynchronous, 
the corner case is still exist. 
[~jianhe] What do you think ?

 Possible deadlock in CapacityScheduler when RM is recovering apps
 -

 Key: YARN-2456
 URL: https://issues.apache.org/jira/browse/YARN-2456
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2456.1.patch


 Consider this scenario:
 1. RM is configured with a single queue and only one application can be 
 active at a time.
 2. Submit App1 which uses up the queue's whole capacity
 3. Submit App2 which remains pending.
 4. Restart RM.
 5. App2 is recovered before App1, so App2 is added to the activeApplications 
 list. Now App1 remains pending (because of max-active-app limit)
 6. All containers of App1 are now recovered when NM registers, and use up the 
 whole queue capacity again.
 7. Since the queue is full, App2 cannot proceed to allocate AM container.
 8. In the meanwhile, App1 cannot proceed to become active because of the 
 max-active-app limit 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127374#comment-14127374
 ] 

Hadoop QA commented on YARN-1458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667451/YARN-1458.006.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4859//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4859//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127384#comment-14127384
 ] 

Hadoop QA commented on YARN-2033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667445/YARN-2033.10.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4858//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4858//console

This message is automatically generated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.2.patch, YARN-2033.3.patch, 
 YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.8.patch, YARN-2033.9.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, 
 YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-09 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127403#comment-14127403
 ] 

bc Wong commented on YARN-1530:
---

bq. The current writing channel allows the data to be available on the timeline 
server immediately

Let's have reliability before speed. I think one of the requirement of ATS is: 
*The channel for writing events should be reliable.*

I'm using *reliable* here in a strong sense, not the TCP-best-effort style 
reliability. HDFS is reliable. Kafka is reliable. (They are also scalable and 
robust.) A normal RPC connection is not. I don't want the ATS to be able to 
slow down my writes, and therefore, my applications, at all. For example, an 
ATS failover shouldn't pause all my applications for N seconds. A direct RPC to 
the ATS for writing seems a poor choice in general.

Yes, you could make a distributed reliable scalable ATS service to accept 
writing events. But that seems a lot of work, while we can leverage existing 
technologies.

If the channel itself is pluggable, then we have lots of options. Kafka is a 
very good choice, for sites that already deploy Kafka and know how to operate 
it. Using HDFS as a channel is also a good default implementation, for people 
already know how to scale and manage HDFS. Embedding a Kafka broker with each 
ATS daemon is also an option, if we're ok with that dependency.

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-09-09 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2440:

Attachment: apache-yarn-2440.5.patch

Uploaded new patch to address Vinod's concerns.

bq.containers-limit-cpu-percentage - 
yarn.nodemanager.resource.percentage-cpu-limit to be consistent? Similarly 
NM_CONTAINERS_CPU_PERC? I don't like the tag 'resource', it should have been 
'resources' but it is what it is.

I'm worried that calling it that will lead users to think it's a percentage of 
the vcores that they specify. In the patch I've changed it to 
yarn.nodemanager.resource.percentage-physical-cpu-limit but if you or Jason 
feel strongly about it, I can change it to 
yarn.nodemanager.resource.percentage-cpu-limit.

bq.You still have refs to YarnConfiguration.NM_CONTAINERS_CPU_ABSOLUTE in 
the patch. Similarly the javadoc in NodeManagerHardwareUtils needs to be 
updated if we are not adding the absolute cpu config. It should no longer refer 
to number of cores that should be used for YARN containers

Fixed.

bq.TestCgroupsLCEResourcesHandler: You can use mockito if you only want to 
override num-processors in TestResourceCalculatorPlugin. Similarly in 
TestNodeManagerHardwareUtils.

Switched to mockito.

bq.The tests may fail on a machine with  4 cores?
Don't think so. The tests mock the getNumProcessors() function so we should be 
fine.


 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 apache-yarn-2440.5.patch, screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127467#comment-14127467
 ] 

Hadoop QA commented on YARN-2440:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12667461/apache-yarn-2440.5.patch
  against trunk revision 2749fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4860//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4860//console

This message is automatically generated.

 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 apache-yarn-2440.5.patch, screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Benoy Antony (JIRA)
Benoy Antony created YARN-2527:
--

 Summary: NPE in ApplicationACLsManager
 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony


NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
The relevant stacktrace snippet from the ResourceManager logs is as below
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127470#comment-14127470
 ] 

Benoy Antony commented on YARN-2527:


working on a patch for this issue.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony

 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-154) Create Yarn trunk and commit jobs

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-154:
--
Fix Version/s: (was: 3.0.0)
   2.0.2-alpha
   0.23.5

 Create Yarn trunk and commit jobs
 -

 Key: YARN-154
 URL: https://issues.apache.org/jira/browse/YARN-154
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Eli Collins
Assignee: Robert Joseph Evans
 Fix For: 2.0.2-alpha, 0.23.5


 Yarn should have Hadoop-Yarn-trunk and Hadoop-Yarn-trunk-Commit jenkins jobs 
 that correspond to the Common, HDFS, and MR ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-09-09 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-2080:
---
Attachment: YARN-2080.patch

Thanks [~vinodkv] for reviewing the patch. I am uploading a new patch that has 
includes your feedback:
  * Renamed all Yarn config variables as you suggested. I prefer using the 
standalone configs as it gives us more flexibility.
  * Removed duplicate logging in _ClientRMService_  
_ReservationInputValidator_. Consistenly uses RMAuditLogger throughout.
  * Fixes in AbstractReservationSystem as you suggested.
  * Updated stale references to queues in Javadocs of 
_YarnClient.submitReservation()_
  * _TestYarnClient_  _TestClientRMService_ use newInstance instead of PBImpls
  * Renamed _ReservationRequest.setLeaseDuration()_ was renamed to be simply 
_setDuration()_
  * Moved _CapacitySchedulerConfiguration_ to YARN-1711

bq. ReservationInputValidator: Deleting a request shouldn't need 
validateReservationUpdateRequest-validateReservationDefinition. We only need 
the ID validation

That's exactly what's being done. ReservationDefinitions are validated only for 
submission/update.

bq. checkReservationACLs: Today anyone who can submit applications can also 
submit reservations. We may want to separate them, if you agree, I'll file a 
ticket for future separation of these ACLs.

I agree. I have a set of follow up enhancement JIRAs to YARN-1051 in mind one 
of which was exactly to consider separation of ACLs as you pointed out.

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1215) Yarn URL should include userinfo

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-1215:
---
Fix Version/s: (was: 3.0.0)

 Yarn URL should include userinfo
 

 Key: YARN-1215
 URL: https://issues.apache.org/jira/browse/YARN-1215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 2.2.0

 Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch


 In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
 userinfo as part of the URL. When converting a {{java.net.URI}} object into 
 the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
 set uri host as the url host. If the uri has a userinfo part, the userinfo is 
 discarded. This will lead to information loss if the original uri has the 
 userinfo, e.g. foo://username:passw...@example.com will be converted to 
 foo://example.com and username/password information is lost during the 
 conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-794) YarnClientImpl.submitApplication() to add a timeout

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-794:
--
Fix Version/s: (was: 2.1.0-beta)
   (was: 3.0.0)

 YarnClientImpl.submitApplication() to add a timeout
 ---

 Key: YARN-794
 URL: https://issues.apache.org/jira/browse/YARN-794
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Steve Loughran
Priority: Minor

 {{YarnClientImpl.submitApplication()}} can spin forever waiting for the RM to 
 accept the submission, ignoring interrupts on the sleep.
 # A timeout allows client applications to recognise and react to a failure of 
 the RM to accept work in a timely manner.
 # The interrupt exception could be converted to an {{InterruptedIOException}} 
 and raised within the current method signature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127493#comment-14127493
 ] 

Hadoop QA commented on YARN-2080:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667468/YARN-2080.patch
  against trunk revision 2749fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4861//console

This message is automatically generated.

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2528:
-

 Summary: Cross Origin Filter Http response split vulnerability 
protection rejects valid origins
 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


URLEncoding is too strong of a protection for HTTP Response Split Vulnerability 
protection and major browser reject the encoded Origin. An adequate protection 
is simply to remove all CRs LFs as in the case of PHP's header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2528:
--
Attachment: YARN-2528-v1.patch

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Description: 
NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
The relevant stacktrace snippet from the ResourceManager logs is as below
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{code}

This issue was reported by [~miguenther].

  was:
NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
The relevant stacktrace snippet from the ResourceManager logs is as below
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{code}


 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony

 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127575#comment-14127575
 ] 

Hadoop QA commented on YARN-2528:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667480/YARN-2528-v1.patch
  against trunk revision 2749fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4862//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4862//console

This message is automatically generated.

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127592#comment-14127592
 ] 

Jonathan Eagles commented on YARN-2528:
---

[~zjshen], sorry to bother you again. Found another bug while working on 
getting the Tez UI running in a hosted environment. Can you give a review?

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2526) Scheduler Load Simulator may enter deadlock if lots of applications submitted to the RM at the same time

2014-09-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127647#comment-14127647
 ] 

Karthik Kambatla commented on YARN-2526:


Thanks for reporting and fixing this, Wei.

+1. Committing this. 

 Scheduler Load Simulator may enter deadlock if lots of applications submitted 
 to the RM at the same time
 

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2526) SLS can deadlock when all the threads are taken by AMSimulators

2014-09-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2526:
---
  Component/s: scheduler-load-simulator
 Priority: Critical  (was: Minor)
 Target Version/s: 2.6.0
Affects Version/s: 2.5.1

 SLS can deadlock when all the threads are taken by AMSimulators
 ---

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 2.5.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2526) SLS can deadlock when all the threads are taken by AMSimulators

2014-09-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2526:
---
Summary: SLS can deadlock when all the threads are taken by AMSimulators  
(was: Scheduler Load Simulator may enter deadlock if lots of applications 
submitted to the RM at the same time)

 SLS can deadlock when all the threads are taken by AMSimulators
 ---

 Key: YARN-2526
 URL: https://issues.apache.org/jira/browse/YARN-2526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 2.5.1
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2526-1.patch


 The simulation may enter deadlock if all application simulators hold all 
 threads provided by the thread pool, and all wait for AM container 
 allocation. In that case, all AM simulators wait for NM simulators to do 
 heartbeat to allocate resource, and all NM simulators wait for AM simulators 
 to release some threads. The simulator is deadlocked.
 To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
 {code}
 // waiting until the AM container is allocated
 while (true) {
   if (response != null  ! response.getAllocatedContainers().isEmpty()) {
 // get AM container
 .
 break;
   }
   // this sleep time is different from HeartBeat
   Thread.sleep(1000);
   // send out empty request
   sendContainerRequest();
   response = responseQueue.take();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-415:

Attachment: YARN-415.201409092204.txt

[~jianhe], thank you very much for your time in reviewing this patch and your 
helpful suggestions.

{quote}
seems we don't need this check, because the returned 
ApplicationResourceUsageReport for non-active attempt is anyways null.
{code}
// Only add in the running containers if this is the active attempt.
RMAppAttempt currentAttempt = rmContext.getRMApps()
   .get(attemptId.getApplicationId()).getCurrentAppAttempt();
if (currentAttempt != null 
currentAttempt.getAppAttemptId().equals(attemptId)) {
{code}
{quote}
You are correct. The above check for {{currentAttempt != null}} is not 
necessary.
With this new patch, I have upmerged again (since it wasn't applying cleanly) 
and removed this check.

[~kkambatl], I would also like to thank you for your help on this patch. Were 
you okay with the changes I made in response to your suggestions? It would be 
great if we could move this patch over the goal line soon.

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127666#comment-14127666
 ] 

Karthik Kambatla edited comment on YARN-415 at 9/9/14 10:20 PM:


Eric - I haven't had a chance to take a look at the latest patch. I trust Jian 
and you to make sure the concerns are addressed, the suggestions themselves 
were straight-forward. Thanks for staying patient through this long-drawn JIRA. 


was (Author: kkambatl):
Eric - I haven't had a chance to take a look at the latest patch. I trust Jian 
and you to make sure the concerns are addressed, the suggestions themselves 
were straight-forward. 

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127666#comment-14127666
 ] 

Karthik Kambatla commented on YARN-415:
---

Eric - I haven't had a chance to take a look at the latest patch. I trust Jian 
and you to make sure the concerns are addressed, the suggestions themselves 
were straight-forward. 

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: YARN-2527.patch

Attaching a patch which checks if the map of ACLs foran application is null. 
If null, it uses the default ACL.

A new test case is added which checks the normal case as well as the case when 
the ACLis not set for an application.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127755#comment-14127755
 ] 

Benoy Antony commented on YARN-2527:


[~vinodkv], could you please review this jira ?
Could you also make me a Yarn contributor so that I can assign the jira to me?


 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1458:
---
Attachment: yarn-1458-7.patch

Thanks Zhihai. 

I see the advantage of the second approach. My main concern is readability of 
the approach. I have taken a stab at making it more readable/maintainable 
through only cosmetic changes. Can you please take a look and see if these 
cosmetic changes make sense to you. 

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1412#comment-1412
 ] 

Hadoop QA commented on YARN-415:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12667510/YARN-415.201409092204.txt
  against trunk revision 28d99db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4863//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4863//console

This message is automatically generated.

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127795#comment-14127795
 ] 

Hadoop QA commented on YARN-2527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667532/YARN-2527.patch
  against trunk revision 28d99db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4864//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4864//console

This message is automatically generated.

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127854#comment-14127854
 ] 

Hadoop QA commented on YARN-1458:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667535/yarn-1458-7.patch
  against trunk revision 28d99db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4865//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4865//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 

[jira] [Updated] (YARN-1712) Admission Control: plan follower

2014-09-09 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1712:
---
Attachment: YARN-1712.3.patch

Thanks [~jianhe] for your detailed feedback. I am attaching a patch with the 
following updates:
  * Made move apps logic synchronous and move is to defReservationQueue 
(renamed)
  * Removed the synchronized on scheduler as individual calls are already 
synchronized
  * Fixed comment formatting and variable names
  * Created a common method to calculate lhsRes and rhsRes
  * Optimized the loop as suggested

Some clarifications:
  * Exceptions are suppressed deliberately as PlanFollower is a background 
timer thread and we don't want it to exit
  * _plan.getReservationsAtTime(now)_ is used by others like Replanners. We 
need the reservations and not just the names even in PlanFollower so leaving it 
as is
 * Tried moving the default queue creating to when PlanQueue is initialized in 
CapacityScheduler but it was getting overly complex mainly due to the relaxed 
constraint of child capacities =100% for PlanQueues. This is just an 
additional hashmap lookup with the code being much cleaner so not moving it for 
now. If it is still a concern, I can add a flag to Plan and check that instead 
of CapacityScheduler#getQueue

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-09 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: YARN-1712.3.patch

Updated patch to include *CapacitySchedulerConfiguration* based on by 
[~vinodkv]'s [suggestion | 
https://issues.apache.org/jira/browse/YARN-2080?focusedCommentId=14125994] as 
the _majority_ of the configurations or for enforcement policies

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.patch, YARN-1712.3.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-09 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: (was: YARN-1712.3.patch)

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-09 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: YARN-1711.2.patch

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1712) Admission Control: plan follower

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127913#comment-14127913
 ] 

Hadoop QA commented on YARN-1712:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667564/YARN-1712.3.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4866//console

This message is automatically generated.

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:

Attachment: yarn-1458-8.patch

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127929#comment-14127929
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667567/YARN-1712.3.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4867//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127939#comment-14127939
 ] 

zhihai xu commented on YARN-1458:
-

Hi [~kasha], Your change makes the code much easier to read and maintain.
I uploaded a new patch yarn-1458-8.patch with two minor changes based on your 
patch:
use Math.max instead of Math.abs and check schedulables.isEmpty() after 
handleFixedFairShares.
Please review it.
thanks


 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127941#comment-14127941
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667570/YARN-1711.2.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4868//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127967#comment-14127967
 ] 

Hadoop QA commented on YARN-1458:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667572/yarn-1458-8.patch
  against trunk revision 0de563a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4869//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4869//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 

[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps

2014-09-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127987#comment-14127987
 ] 

Wangda Tan commented on YARN-2456:
--

I think there're too many factors will affect active applications list after 
application submission. IMHO, recovering application by time of creation or 
submission is not a big deal, keep it simple and straight-forward should be 
more important, I prefer Jian's method. +1 for the patch

Thanks,

 Possible deadlock in CapacityScheduler when RM is recovering apps
 -

 Key: YARN-2456
 URL: https://issues.apache.org/jira/browse/YARN-2456
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2456.1.patch


 Consider this scenario:
 1. RM is configured with a single queue and only one application can be 
 active at a time.
 2. Submit App1 which uses up the queue's whole capacity
 3. Submit App2 which remains pending.
 4. Restart RM.
 5. App2 is recovered before App1, so App2 is added to the activeApplications 
 list. Now App1 remains pending (because of max-active-app limit)
 6. All containers of App1 are now recovered when NM registers, and use up the 
 whole queue capacity again.
 7. Since the queue is full, App2 cannot proceed to allocate AM container.
 8. In the meanwhile, App1 cannot proceed to become active because of the 
 max-active-app limit 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2456) Possible deadlock in CapacityScheduler when RM is recovering apps

2014-09-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127989#comment-14127989
 ] 

Wangda Tan commented on YARN-2456:
--

In addition, I suggest to change the {{deadlock}} of the title to be 
{{livelock}}, because no thread is locked here, it just a weird state 
preventing the state to make progress, instead of lock. See: 
http://en.wikipedia.org/wiki/Deadlock#Livelock

Wangda

 Possible deadlock in CapacityScheduler when RM is recovering apps
 -

 Key: YARN-2456
 URL: https://issues.apache.org/jira/browse/YARN-2456
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2456.1.patch


 Consider this scenario:
 1. RM is configured with a single queue and only one application can be 
 active at a time.
 2. Submit App1 which uses up the queue's whole capacity
 3. Submit App2 which remains pending.
 4. Restart RM.
 5. App2 is recovered before App1, so App2 is added to the activeApplications 
 list. Now App1 remains pending (because of max-active-app limit)
 6. All containers of App1 are now recovered when NM registers, and use up the 
 whole queue capacity again.
 7. Since the queue is full, App2 cannot proceed to allocate AM container.
 8. In the meanwhile, App1 cannot proceed to become active because of the 
 max-active-app limit 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128005#comment-14128005
 ] 

Benoy Antony commented on YARN-2527:


[~zjshen], could you please review this jira ?
Could you also make me a Yarn contributor so that I can assign the jira to me?

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128011#comment-14128011
 ] 

Wangda Tan commented on YARN-796:
-

Hi [~cwelch] and [~aw],
I agree with #3 as well, since the original starting point is to avoid 
case-typo from users. But refer to other existing configs of YARN, like queue 
name of CS, different case of queue name means different queue. I prefer to 
drop the requirement if there's no strong opinion to do that.

Thanks,
Wangda 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1712) Admission Control: plan follower

2014-09-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128023#comment-14128023
 ] 

Wangda Tan commented on YARN-1712:
--

Hi [~subru], 
I've just taken a look at your latest patch, now the code is much cleaner than 
before, thanks!

I'm not quite understand what you said,
bq. Tried moving the default queue creating to when PlanQueue is initialized in 
CapacityScheduler but it was getting overly complex mainly due to the relaxed 
constraint of child capacities =100% for PlanQueues. This is just an 
additional hashmap lookup with the code being much cleaner so not moving it for 
now. If it is still a concern, I can add a flag to Plan and check that instead 
of CapacityScheduler#getQueue
Could you please elaborate? 

And in addition, a very minor comments is, could you put LOG.debug within block 
like LOG.debug in other modules?
{code}
if (LOG.isDebugEnabled()) {
 // ...
}
{code}

Thanks,
Wangda

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2528) Cross Origin Filter Http response split vulnerability protection rejects valid origins

2014-09-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128026#comment-14128026
 ] 

Zhijie Shen commented on YARN-2528:
---

[~jeagles], no problem.  I compared our CrossOriginFilter with the one in 
Jetty. That one seems not to do any post-process for the string obtained from 
ORIGIN header. What's the reason that we need for our CrossOriginFilter? 
According to test case, you want to avoid the issue that the string contains 
the other header, don't you? HttpServletResponse.getHeader doesn't handle 
header splitting properly?

BTW, it seems that ours' only allows one origin in the request header, but 
Jetty's allows multiple one. And I find a specification: 
http://tools.ietf.org/html/draft-abarth-origin-09, which tells that ORIGIN can 
be a list. Any thought?

 Cross Origin Filter Http response split vulnerability protection rejects 
 valid origins
 --

 Key: YARN-2528
 URL: https://issues.apache.org/jira/browse/YARN-2528
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2528-v1.patch


 URLEncoding is too strong of a protection for HTTP Response Split 
 Vulnerability protection and major browser reject the encoded Origin. An 
 adequate protection is simply to remove all CRs LFs as in the case of PHP's 
 header function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2527:
--
Assignee: Benoy Antony

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-09-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128038#comment-14128038
 ] 

Zhijie Shen commented on YARN-2527:
---

[~benoyantony], I'va added you as a YARN contributor, and assign this Jira to 
you.

W.R.T the NPE, did you have a chance to see why NPE will happen? For each 
submitted app, its acls seem to be added into ApplicationACLsManager. 
ContainerLaunchContext#getApplicationACLs should return a empty acls map, if 
user doesn't specify anything, right?

 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk

2014-09-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128049#comment-14128049
 ] 

Wangda Tan commented on YARN-2158:
--

Thanks [~vvasudev] for the fix, I think it looks good to me.
And also thanks for improvements in the patch, looks good to me too.
+1,

Wangda

 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Attachments: apache-yarn-2158.0.patch, apache-yarn-2158.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-09-09 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128097#comment-14128097
 ] 

Tsuyoshi OZAWA commented on YARN-2517:
--

[~zjshen], [~vinodkv], can we go with current design(v1 patch)?

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)