[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-17 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534044#comment-13534044
 ] 

Tom White commented on YARN-230:


Arun, yes it looks good to me, +1. We can address any changes that come up in 
later JIRAs. 

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534098#comment-13534098
 ] 

Bikas Saha commented on YARN-230:
-

Thanks guys!

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-223) Change processTree interface to work better with native code

2012-12-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534156#comment-13534156
 ] 

Bikas Saha commented on YARN-223:
-

+1 for the code and approach. The patch changes some public members. Not sure 
about those changes since they may not meet back-compat requirements.

 Change processTree interface to work better with native code
 

 Key: YARN-223
 URL: https://issues.apache.org/jira/browse/YARN-223
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Radim Kolar
Assignee: Radim Kolar
Priority: Critical
 Attachments: pstree-update4.txt, pstree-update6.txt, 
 pstree-update6.txt


 Problem is that on every update of processTree new object is required. This 
 is undesired when working with processTree implementation in native code.
 replace ProcessTree.getProcessTree() with updateProcessTree(). No new object 
 allocation is needed and it simplify application code a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534171#comment-13534171
 ] 

Vinod Kumar Vavilapalli commented on YARN-3:


Will review by EOD today. Thanks for the tip.

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-270) RM scheduler event handler thread gets behind

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534264#comment-13534264
 ] 

Vinod Kumar Vavilapalli commented on YARN-270:
--

Thanks for filing this Thomas. IIRC, The event-handler's upper limit is about 
0.6 million, somehow we only focus on number of nodes and never thought about 
the scaling issue with large number of applications. There are multiple 
solutions for this, in the order of importance:
 - Make NodeManagers to *NOT* blindly heartbeat irrespective of whether 
previous heartbeat is processed or not.
 - Figure out any obvious bottlenecks in the scheduling code.
 - When all else fails, try to parallelize the scheduler dispatcher.

 RM scheduler event handler thread gets behind
 -

 Key: YARN-270
 URL: https://issues.apache.org/jira/browse/YARN-270
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.5
Reporter: Thomas Graves
Assignee: Thomas Graves

 We had a couple of incidents on a 2800 node cluster where the RM scheduler 
 event handler thread got behind processing events and basically become 
 unusable.  It was still processing apps, but taking a long time (1 hr 45 
 minutes) to accept new apps.   this actually happened twice within 5 days.
 We are using the capacity scheduler and at the time had between 400 and 500 
 applications running.  There were another 250 apps that were in the SUBMITTED 
 state in the RM but the scheduler hadn't processed those to put in pending 
 state yet.  We had about 15 queues none of them hierarchical.  We also had 
 plenty of space lefts on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-275:


 Summary: Make NodeManagers to NOT blindly heartbeat irrespective 
of whether previous heartbeat is processed or not.
 Key: YARN-275
 URL: https://issues.apache.org/jira/browse/YARN-275
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


We need NMs to back off. The event handler mechanism is very scalable but not 
infinitely so :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-275:
-

Component/s: nodemanager

 Make NodeManagers to NOT blindly heartbeat irrespective of whether previous 
 heartbeat is processed or not.
 --

 Key: YARN-275
 URL: https://issues.apache.org/jira/browse/YARN-275
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong

 We need NMs to back off. The event handler mechanism is very scalable but not 
 infinitely so :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-275:


Assignee: Xuan Gong  (was: Vinod Kumar Vavilapalli)

Xuan, can you please take this up? Thanks.

 Make NodeManagers to NOT blindly heartbeat irrespective of whether previous 
 heartbeat is processed or not.
 --

 Key: YARN-275
 URL: https://issues.apache.org/jira/browse/YARN-275
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong

 We need NMs to back off. The event handler mechanism is very scalable but not 
 infinitely so :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-198:


Assignee: Senthil V Kumar

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: Senthil V Kumar
Priority: Minor

 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-270) RM scheduler event handler thread gets behind

2012-12-17 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534337#comment-13534337
 ] 

Nathan Roberts commented on YARN-270:
-

Could we also add some additional flow control within the RM to prevent this 
work from getting into the event queues in the first place? Having the clients 
throttle on their end is important in the short term but in the long run we 
need a flow control strategy that can exert back pressure at all stages of the 
pipeline.

 RM scheduler event handler thread gets behind
 -

 Key: YARN-270
 URL: https://issues.apache.org/jira/browse/YARN-270
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.5
Reporter: Thomas Graves
Assignee: Thomas Graves

 We had a couple of incidents on a 2800 node cluster where the RM scheduler 
 event handler thread got behind processing events and basically become 
 unusable.  It was still processing apps, but taking a long time (1 hr 45 
 minutes) to accept new apps.   this actually happened twice within 5 days.
 We are using the capacity scheduler and at the time had between 400 and 500 
 applications running.  There were another 250 apps that were in the SUBMITTED 
 state in the RM but the scheduler hadn't processed those to put in pending 
 state yet.  We had about 15 queues none of them hierarchical.  We also had 
 plenty of space lefts on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-223) Change processTree interface to work better with native code

2012-12-17 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534397#comment-13534397
 ] 

Radim Kolar commented on YARN-223:
--

its private api

@InterfaceAudience.Private
@InterfaceStability.Unstable

 Change processTree interface to work better with native code
 

 Key: YARN-223
 URL: https://issues.apache.org/jira/browse/YARN-223
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Radim Kolar
Assignee: Radim Kolar
Priority: Critical
 Attachments: pstree-update4.txt, pstree-update6.txt, 
 pstree-update6.txt


 Problem is that on every update of processTree new object is required. This 
 is undesired when working with processTree implementation in native code.
 replace ProcessTree.getProcessTree() with updateProcessTree(). No new object 
 allocation is needed and it simplify application code a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-271) Fair scheduler hits IllegalStateException trying to reserve different apps on same node

2012-12-17 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-271:


Description: After the fair scheduler reserves a container on a node, it 
doesn't check for reservations it just made when trying to make more 
reservations during the same heartbeat.  (was: After the fair scheduler 
reserves a container on a node, it doesn't check for reservations it just made, 
when trying to make more reservations during the same heartbeat.)

 Fair scheduler hits IllegalStateException trying to reserve different apps on 
 same node
 ---

 Key: YARN-271
 URL: https://issues.apache.org/jira/browse/YARN-271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-271.patch


 After the fair scheduler reserves a container on a node, it doesn't check for 
 reservations it just made when trying to make more reservations during the 
 same heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-17 Thread nemon lou (JIRA)
nemon lou created YARN-276:
--

 Summary: Capacity Scheduler can hang when submit many jobs 
concurrently
 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.1-alpha, 3.0.0
Reporter: nemon lou


In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
scheduler can hang with most resources taken up by AM and don't have enough 
resources for tasks.And then all applications hang there.
The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
check directly.Instead ,this property only used for maxActiveApplications. And 
maxActiveApplications is computed by minimumAllocation (not by Am actually 
used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-17 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534585#comment-13534585
 ] 

Hadoop QA commented on YARN-276:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561406/YARN-276.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestApplicationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId
  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientTokens

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/227//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/227//console

This message is automatically generated.

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-270) RM scheduler event handler thread gets behind

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534586#comment-13534586
 ] 

Vinod Kumar Vavilapalli commented on YARN-270:
--

Nathan, unfortunately, the dispatcher framework cannot exert back pressure in 
general, each event producer needs to control itself.

OTOH, YARN-275 is indeed a long term fix. NMs back off just like the TTs do in 
1.*.

 RM scheduler event handler thread gets behind
 -

 Key: YARN-270
 URL: https://issues.apache.org/jira/browse/YARN-270
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.5
Reporter: Thomas Graves
Assignee: Thomas Graves

 We had a couple of incidents on a 2800 node cluster where the RM scheduler 
 event handler thread got behind processing events and basically become 
 unusable.  It was still processing apps, but taking a long time (1 hr 45 
 minutes) to accept new apps.   this actually happened twice within 5 days.
 We are using the capacity scheduler and at the time had between 400 and 500 
 applications running.  There were another 250 apps that were in the SUBMITTED 
 state in the RM but the scheduler hadn't processed those to put in pending 
 state yet.  We had about 15 queues none of them hierarchical.  We also had 
 plenty of space lefts on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-272) Fair scheduler log messages try to print objects without overridden toString methods

2012-12-17 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-272:


Attachment: YARN-272.patch

 Fair scheduler log messages try to print objects without overridden toString 
 methods
 

 Key: YARN-272
 URL: https://issues.apache.org/jira/browse/YARN-272
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-272.patch


 A lot of junk gets printed out like this:
 2012-12-11 17:31:52,998 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
 Application application_1355270529654_0003 reserved container 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl@324f0f97
  on node host: c1416.hal.cloudera.com:46356 #containers=7 available=0 
 used=8192, currently has 4 at priority 
 org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@33; 
 currentReservation 4096

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-271) Fair scheduler hits IllegalStateException trying to reserve different apps on same node

2012-12-17 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534624#comment-13534624
 ] 

Sandy Ryza commented on YARN-271:
-

The last patch doesn't fix the problem entirely.  Added one that does and 
includes a test.

 Fair scheduler hits IllegalStateException trying to reserve different apps on 
 same node
 ---

 Key: YARN-271
 URL: https://issues.apache.org/jira/browse/YARN-271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-271-1.patch, YARN-271.patch


 After the fair scheduler reserves a container on a node, it doesn't check for 
 reservations it just made when trying to make more reservations during the 
 same heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-271) Fair scheduler hits IllegalStateException trying to reserve different apps on same node

2012-12-17 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-271:


Attachment: YARN-271-1.patch

 Fair scheduler hits IllegalStateException trying to reserve different apps on 
 same node
 ---

 Key: YARN-271
 URL: https://issues.apache.org/jira/browse/YARN-271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-271-1.patch, YARN-271.patch


 After the fair scheduler reserves a container on a node, it doesn't check for 
 reservations it just made when trying to make more reservations during the 
 same heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-271) Fair scheduler hits IllegalStateException trying to reserve different apps on same node

2012-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534636#comment-13534636
 ] 

Hadoop QA commented on YARN-271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561420/YARN-271-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/229//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/229//console

This message is automatically generated.

 Fair scheduler hits IllegalStateException trying to reserve different apps on 
 same node
 ---

 Key: YARN-271
 URL: https://issues.apache.org/jira/browse/YARN-271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-271-1.patch, YARN-271.patch


 After the fair scheduler reserves a container on a node, it doesn't check for 
 reservations it just made when trying to make more reservations during the 
 same heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534640#comment-13534640
 ] 

Vinod Kumar Vavilapalli commented on YARN-3:


Did a quick review (incremental review, trusting my previous self :) ). Looks 
good, let's track the pending items separately. Triggering Jenkins on YARN-147 
and will close the tickets once blessed.

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534645#comment-13534645
 ] 

Hadoop QA commented on YARN-147:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555695/YARN-147-v8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/230//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/230//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/230//console

This message is automatically generated.

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534648#comment-13534648
 ] 

Vinod Kumar Vavilapalli commented on YARN-3:


Andrew, can you please look at the FindBugs and the test case issue at 
YARN-147. Let's try and get this in tomorrow.

Also can you please find the pending issues. I can file any that I know 
tomorrow. Tx.

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-103) Add a yarn AM - RM client module

2012-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534703#comment-13534703
 ] 

Hadoop QA commented on YARN-103:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12559906/YARN-103.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/231//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/231//console

This message is automatically generated.

 Add a yarn AM - RM client module
 

 Key: YARN-103
 URL: https://issues.apache.org/jira/browse/YARN-103
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: YARN-103.1.patch, YARN-103.2.patch, YARN-103.3.patch, 
 YARN-103.4.patch, YARN-103.4.wrapper.patch, YARN-103.5.patch, 
 YARN-103.6.patch, YARN-103.7.patch


 Add a basic client wrapper library to the AM RM protocol in order to prevent 
 proliferation of code being duplicated everywhere. Provide helper functions 
 to perform reverse mapping of container requests to RM allocation resource 
 request table format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-103) Add a yarn AM - RM client module

2012-12-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534733#comment-13534733
 ] 

Siddharth Seth commented on YARN-103:
-

Apologies for taking ages to look at this. Some minor stuff pending.
- AMRMClient JavaDoc
  1) The javadoc for the ContainerRequest class has some typos and needs to be 
punctuated (instead of newlines)
  2) allocate javadoc - makes a reference to makeContainerRequest which is now 
called addContainerRequest. Also it'll be useful to mention the reboot flag 
which may be sent as part of the response.
- AMRMCLientImpl unregisterApplicationMaster - setAppAttemptId doesn't need to 
be in a synchronized block
- AMRMClientImpl - add/decContainerRequest rack null checks need fixing (host 
instead of rack)
- AMRMClientImpl.addResourceRequestToAsk - am not sure why this method is 
needed. A simple synchronized asks.add should be sufficient 

Also, would prefer the DistributedShell changes in a separate jira - just to 
keep this patch clean. Breaking that out of the current patch should be simple 
enough.


 Add a yarn AM - RM client module
 

 Key: YARN-103
 URL: https://issues.apache.org/jira/browse/YARN-103
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: YARN-103.1.patch, YARN-103.2.patch, YARN-103.3.patch, 
 YARN-103.4.patch, YARN-103.4.wrapper.patch, YARN-103.5.patch, 
 YARN-103.6.patch, YARN-103.7.patch


 Add a basic client wrapper library to the AM RM protocol in order to prevent 
 proliferation of code being duplicated everywhere. Provide helper functions 
 to perform reverse mapping of container requests to RM allocation resource 
 request table format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira