[jira] [Commented] (YARN-879) Fix NPE in test/o.a.h.y.server.resourcemanager.Application.getResources()

2013-07-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705501#comment-13705501
 ] 

Junping Du commented on YARN-879:
-

Sure. Vinod. I will try to make tests work. Thanks for sharing background here. 
:)

 Fix NPE in test/o.a.h.y.server.resourcemanager.Application.getResources()
 -

 Key: YARN-879
 URL: https://issues.apache.org/jira/browse/YARN-879
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-879.patch


 getResources() will return a list of containers that allocated by RM. 
 However, it is now return null directly. The worse thing is: if LOG.debug is 
 enabled, then it will definitely cause NPE exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705512#comment-13705512
 ] 

Junping Du commented on YARN-347:
-

Hi, [~acmurthy] The patch is rebase to latest trunk. Would you help to review 
it again? Thx!

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705515#comment-13705515
 ] 

Xuan Gong commented on YARN-763:


Oh, You are right. If we want to let CallBackThread to call asyncClient.stop(), 
we might need to add this part of code inside the CallBackThread.run(). In that 
case, we may need to create a new test class, such as mockAMRMClientAsync, and 
re-write CallBackThread.run().

Any other ideas ? 

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations

2013-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705547#comment-13705547
 ] 

Bikas Saha commented on YARN-521:
-

Sandy, let me start by saying that I really appreciate the effort you are 
putting into this. However, I will be cautious about making changes to the API 
because once we release it in the beta then it will be hard to change them. If 
needed, we can request the release manager to hold the release until we get 
this jira done. Let us ensure that the users who want this get a clean and good 
experience. To be clear, I am not suggesting that the patch as proposed is far 
from it. We are close.
e.g. When you stated above that a node and rack specified together is legal and 
gives a preference for the node, its only subtly different from the default 
scheduler behavior in the absence of any constraints. Luckily this case is 
already covered by the existing code even though we had not explicitly 
articulated it in our thought process. Also, 2 requests with specific node and 
rack each are different from a single specific request with both. This is 
reasonably hard to understand. How can we help the user here? 

Let us explicitly point-wise enumerate the cases that are legal and safe. I can 
help you with that. This will help me or someone else understand the code and 
review it better. If the javadoc is also written in the same manner then it 
will help users clearly understand how to use the API. It will also help ensure 
that we are testing for those cases.

bq. The way I see it, getMatchingRequests(priority1, * ) should give me all the 
requests at priority1, independent of locality relaxation
Thats not quite correct per my intention when I wrote that method. The method 
is supposed to give me requests that I can assign based on the parameters of 
priority, capability and location. Thats why it checks that resource requests 
can actually fit within the capability parameter (thus taking care of scheduler 
normalizations). Intention is to have the AMRMClient do the reverse translation 
of priority, location, capability because it does the forward translation when 
the request is made and therefore is uniquely positioned to make that 
determination correctly. Without this, users might have to reverse engineer the 
mapping logic for themselves. Hence, its sub-optimal to leave the users to fend 
for themselves wrt locality relaxation, specially when we can see that it can 
be tricky :) All this is only in the context of StoredContainerRequest's which 
constrains the request container count to be 1 (without which the book-keeping 
may not be possible in some cases). Solution is simply to store the request 
only for the locations on which we can assign it. Thus when a specific node 
request comes up, we will store that request in the ResourceRequestInfo of the 
node, but not in the ResourceRequestInfo for the inferred rack and ANY. Let me 
know if this doesnt make sense or will not be correct.

bq. My intended behavior is that to do this the user must remove the original 
container request before adding a new one with different locality relaxation. 
Am I misunderstanding how removeContainerRequest works again?
That makes sense. We need to document this. A test would be even better. The 
downside to this approach would be that if the async heartbeat to the RM 
happens in between these 2 calls then the for some time the RM will think that 
the app doesnt need those resources and thus the app can miss some matching 
scheduling opportunities. But I think we can live with that.

Looks like we have forgotten StoredContainerRequest in these changes. Its 
constructor also needs to account for the new flag.


 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705556#comment-13705556
 ] 

Bikas Saha commented on YARN-763:
-

We can simply create another version of the TestCallbackHandler that calls 
asyncClient.stop() when asyncClient calls its getProgress() method. After the 
stop() has completed the method can set a flag and notifyAll(this). The main 
test thread can wait() on the handler object and check that the flag is set 
when it gets notified. Else it waits again. This way if the callback thread is 
deadlocked then test thread will not exit and the test will fail with timeout. 
To verify, the test should fail with join() and pass without it. Similar logic 
is used in other tests.

Please lets not sleep(1000) as this just slows down the testing. Lets sleep(50) 
and set the heartbeat interval to 10. This allows for 5 heartbeats and so the 
verification that actual heartbeat count == 1 is accurate.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705694#comment-13705694
 ] 

Hudson commented on YARN-736:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, 
 YARN-736-4.patch, YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705702#comment-13705702
 ] 

Hudson commented on YARN-368:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
YARN-368. Fixed a typo in error message in Auxiliary services. Contributed 
by Albert Chu. (Revision 1501852)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Fix typo defiend should be defined in error output
 --

 Key: YARN-368
 URL: https://issues.apache.org/jira/browse/YARN-368
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Trivial
 Fix For: 2.1.1-beta

 Attachments: YARN-368.patch


 Noticed the following in an error log output while doing some experiements
 ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException:
  No class defiend for uda.shuffle
 defiend should be defined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705699#comment-13705699
 ] 

Hudson commented on YARN-569:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
YARN-569. Add support for requesting and enforcing preemption requests via
a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 
1502083)

 Result = SUCCESS
cdouglas : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: 

[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705703#comment-13705703
 ] 

Hudson commented on YARN-295:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
YARN-295. Fixed a race condition in ResourceManager RMAppAttempt state 
machine. Contributed by Mayank Bansal. (Revision 1501856)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501856
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Fix For: 2.1.1-beta

 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, 
 YARN-295-trunk-3.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-866) Add test for class ResourceWeights

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705705#comment-13705705
 ] 

Hudson commented on YARN-866:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add test for class ResourceWeights
 --

 Key: YARN-866
 URL: https://issues.apache.org/jira/browse/YARN-866
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch


 Add test case for the class ResourceWeights

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-883) Expose Fair Scheduler-specific queue metrics

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705691#comment-13705691
 ] 

Hudson commented on YARN-883:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Expose Fair Scheduler-specific queue metrics
 

 Key: YARN-883
 URL: https://issues.apache.org/jira/browse/YARN-883
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-883-1.patch, YARN-883-1.patch, YARN-883.patch


 When the Fair Scheduler is enabled, QueueMetrics should include fair share, 
 minimum share, and maximum share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-866) Add test for class ResourceWeights

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705807#comment-13705807
 ] 

Hudson commented on YARN-866:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add test for class ResourceWeights
 --

 Key: YARN-866
 URL: https://issues.apache.org/jira/browse/YARN-866
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch


 Add test case for the class ResourceWeights

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705801#comment-13705801
 ] 

Hudson commented on YARN-569:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
YARN-569. Add support for requesting and enforcing preemption requests via
a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 
1502083)

 Result = FAILURE
cdouglas : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: 

[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705804#comment-13705804
 ] 

Hudson commented on YARN-368:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
YARN-368. Fixed a typo in error message in Auxiliary services. Contributed 
by Albert Chu. (Revision 1501852)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Fix typo defiend should be defined in error output
 --

 Key: YARN-368
 URL: https://issues.apache.org/jira/browse/YARN-368
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Trivial
 Fix For: 2.1.1-beta

 Attachments: YARN-368.patch


 Noticed the following in an error log output while doing some experiements
 ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException:
  No class defiend for uda.shuffle
 defiend should be defined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705794#comment-13705794
 ] 

Hudson commented on YARN-736:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, 
 YARN-736-4.patch, YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705863#comment-13705863
 ] 

Hudson commented on YARN-368:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
YARN-368. Fixed a typo in error message in Auxiliary services. Contributed 
by Albert Chu. (Revision 1501852)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Fix typo defiend should be defined in error output
 --

 Key: YARN-368
 URL: https://issues.apache.org/jira/browse/YARN-368
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Trivial
 Fix For: 2.1.1-beta

 Attachments: YARN-368.patch


 Noticed the following in an error log output while doing some experiements
 ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException:
  No class defiend for uda.shuffle
 defiend should be defined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705860#comment-13705860
 ] 

Hudson commented on YARN-569:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
YARN-569. Add support for requesting and enforcing preemption requests via
a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 
1502083)

 Result = SUCCESS
cdouglas : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 

[jira] [Commented] (YARN-866) Add test for class ResourceWeights

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705866#comment-13705866
 ] 

Hudson commented on YARN-866:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add test for class ResourceWeights
 --

 Key: YARN-866
 URL: https://issues.apache.org/jira/browse/YARN-866
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch


 Add test case for the class ResourceWeights

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705864#comment-13705864
 ] 

Hudson commented on YARN-295:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
YARN-295. Fixed a race condition in ResourceManager RMAppAttempt state 
machine. Contributed by Mayank Bansal. (Revision 1501856)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501856
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Fix For: 2.1.1-beta

 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, 
 YARN-295-trunk-3.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell

2013-07-11 Thread Abhishek Kapoor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705929#comment-13705929
 ] 

Abhishek Kapoor commented on YARN-816:
--

Please correct me if I am wrong.
Are you suggesting a use case where job if fails will start from where it dies 
? If yes, then i think we need to maintain a sate of user application running 
on container allocated. Isn't it a user application's responsibility to figure 
it out whether its a fresh start of app or a recovery ? 

 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-815) Add container failure handling to distributed-shell

2013-07-11 Thread Abhishek Kapoor (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kapoor reassigned YARN-815:


Assignee: Abhishek Kapoor

 Add container failure handling to distributed-shell
 ---

 Key: YARN-815
 URL: https://issues.apache.org/jira/browse/YARN-815
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Abhishek Kapoor

 Today if any container fails because of whatever reason, the app simply 
 ignores them. We should handle retries, improve error reporting etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-865) RM webservices can't query on application Types

2013-07-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-865:
---

Attachment: YARN-865.3.patch

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query on application Types

2013-07-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705956#comment-13705956
 ] 

Xuan Gong commented on YARN-865:


Yes, those logic should move out of the loop.

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query on application Types

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705973#comment-13705973
 ] 

Hadoop QA commented on YARN-865:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591874/YARN-865.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1458//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1458//console

This message is automatically generated.

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-763:
---

Attachment: YARN-763.8.patch

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, 
 YARN-763.8.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706026#comment-13706026
 ] 

Xuan Gong commented on YARN-763:


Yes. Modified the testCallAMRMClientAsyncStopFromCallbackHandler. This test 
will fail with join() and pass without it. 

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, 
 YARN-763.8.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706035#comment-13706035
 ] 

Hadoop QA commented on YARN-763:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591879/YARN-763.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1459//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1459//console

This message is automatically generated.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, 
 YARN-763.8.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706050#comment-13706050
 ] 

Omkar Vinit Joshi commented on YARN-816:


I think this is similar to preemption case... If application supports 
checkpointing then we can start from where it left of.. if not then start from 
scratch..

 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-292:


Assignee: Zhijie Shen

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706108#comment-13706108
 ] 

Zhijie Shen commented on YARN-292:
--

Will look into this problem

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706109#comment-13706109
 ] 

Omkar Vinit Joshi commented on YARN-897:


[~dedcode] / [~curino] you want to work on the patch or can I take over? seems 
like an important bug which needs to be fixed. I looked at the code and on 
container completion it is not resorting the TreeSet which will result into 
unfairness..

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-661:
---

Attachment: YARN-661-20130711.1.patch

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, 
 YARN-661-20130710.1.patch, YARN-661-20130711.1.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query on application Types

2013-07-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706126#comment-13706126
 ] 

Zhijie Shen commented on YARN-865:
--

+1 for the latest patch

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706137#comment-13706137
 ] 

Hadoop QA commented on YARN-661:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591891/YARN-661-20130711.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1460//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1460//console

This message is automatically generated.

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, 
 YARN-661-20130710.1.patch, YARN-661-20130711.1.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706152#comment-13706152
 ] 

Carlo Curino commented on YARN-897:
---

I agree this need fixing soon. We have a first draft of the patch, we were 
planning to test it out carefully before posting it, but if you have cycles we 
can socialize it right-away and we can work on it together.  
[~dedcode] please post the patch in its current state. [~ojoshi] you can check 
it out and we can test/verify in the meantime. 

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Djellel Eddine Difallah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Djellel Eddine Difallah updated YARN-897:
-

Attachment: YARN-897-1.patch

Attached is a first patch attempt to address the bug:
Upon container completion, which triggers completedContainer(), remove and 
reinsert the queue into its parent's childQueues. This operation is done 
recursively starting from the leafQueue where the container got released. 
Thus, by handling both cases where usedCapacity is ever changed (assignement 
and completion) the TreeSet remains properly sorted.

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED

2013-07-11 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-245:
---

Attachment: YARN-245-trunk-2.patch

Thanks [~ojoshi] and [~vinodkv] for the review.

Updated the patch.

Thanks,
Mayank

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE

2013-07-11 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706192#comment-13706192
 ] 

Mayank Bansal commented on YARN-299:


Sure [~vinodkv]. I am reopening YARN-820 and closing this one.

Thanks,
Mayank

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (YARN-820) NodeManager has invalid state transition after error in resource localization

2013-07-11 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reopened YARN-820:



 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE

2013-07-11 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal resolved YARN-299.


Resolution: Cannot Reproduce

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706197#comment-13706197
 ] 

Omkar Vinit Joshi commented on YARN-744:


[~bikassaha] sounds reasonable ..will take a look at it again.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-820) NodeManager has invalid state transition after error in resource localization

2013-07-11 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-820:
---

Attachment: YARN-820-trunk-1.patch

Attaching the patch.

Thanks,
Mayank

 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-820-trunk-1.patch, 
 yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-820) NodeManager has invalid state transition after error in resource localization

2013-07-11 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706210#comment-13706210
 ] 

Mayank Bansal commented on YARN-820:


Hi,

I am reopening this and closing YARN-299 as this problem is more on this 
scenario as mentioned by [~ojoshi]

https://issues.apache.org/jira/browse/YARN-299?focusedCommentId=13703820page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13703820

There is one more issue to synchronize the the call to tostring in terms of 
getting the resources. Fixing that as well as part of this JIRA.

Thanks,
Mayank



 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706214#comment-13706214
 ] 

Hadoop QA commented on YARN-245:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591902/YARN-245-trunk-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1461//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1461//console

This message is automatically generated.

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-820) NodeManager has invalid state transition after error in resource localization

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706229#comment-13706229
 ] 

Hadoop QA commented on YARN-820:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591906/YARN-820-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1462//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1462//console

This message is automatically generated.

 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-820-trunk-1.patch, 
 yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-912) Create exceptions package in common/api for yarn and move client facing exceptions to them

2013-07-11 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-912:
---

Attachment: YARN-912-trunk-1.patch

Attaching patch.

Thanks,
Mayank

 Create exceptions package in common/api for yarn and move client facing 
 exceptions to them
 --

 Key: YARN-912
 URL: https://issues.apache.org/jira/browse/YARN-912
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-912-trunk-1.patch


 Exceptions like InvalidResourceBlacklistRequestException, 
 InvalidResourceRequestException, InvalidApplicationMasterRequestException etc 
 are currently inside ResourceManager and not visible to clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query on application Types

2013-07-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706251#comment-13706251
 ] 

Hitesh Shah commented on YARN-865:
--

[~xgong] Documentation is still not clear. How are multiple types meant to be 
specified? Should one use /apps?appTypes=type1appTypes=type2 or some other 
format? How does the code handle it if appTypes is defined twice in the query 
params in the url?

javax.ws.rs.QueryParam supports a [Sorted]Set out of the box. Should we look 
into using that directly instead of playing around with tokenizing based on ,?

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered

2013-07-11 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706253#comment-13706253
 ] 

Mayank Bansal commented on YARN-369:


Thanks [~bikassaha] for comitting this.
I have updated the patch for YARN-912

Thanks,
Mayank

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, trunk-win
Reporter: Hitesh Shah
Assignee: Mayank Bansal
 Fix For: 2.1.0-beta

 Attachments: YARN-369.patch, YARN-369-trunk-1.patch, 
 YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch


 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-333) Schedulers cannot control the queue-name of an application

2013-07-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706271#comment-13706271
 ] 

Sandy Ryza commented on YARN-333:
-

Attached rebased patch.

 Schedulers cannot control the queue-name of an application
 --

 Key: YARN-333
 URL: https://issues.apache.org/jira/browse/YARN-333
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, 
 YARN-333.patch


 Currently, if an app is submitted without a queue, RMAppManager sets the 
 RMApp's queue to default.
 A scheduler may wish to make its own decision on which queue to place an app 
 in if none is specified. For example, when the fair scheduler 
 user-as-default-queue config option is set to true, and an app is submitted 
 with no queue specified, the fair scheduler should assign the app to a queue 
 with the user's name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-333) Schedulers cannot control the queue-name of an application

2013-07-11 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-333:


Attachment: YARN-333-3.patch

 Schedulers cannot control the queue-name of an application
 --

 Key: YARN-333
 URL: https://issues.apache.org/jira/browse/YARN-333
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, 
 YARN-333.patch


 Currently, if an app is submitted without a queue, RMAppManager sets the 
 RMApp's queue to default.
 A scheduler may wish to make its own decision on which queue to place an app 
 in if none is specified. For example, when the fair scheduler 
 user-as-default-queue config option is set to true, and an app is submitted 
 with no queue specified, the fair scheduler should assign the app to a queue 
 with the user's name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-912) Create exceptions package in common/api for yarn and move client facing exceptions to them

2013-07-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706273#comment-13706273
 ] 

Sandy Ryza commented on YARN-912:
-

Does it really make sense to put exceptions in their own package?  Is their any 
precedent for this in other well known Java libraries?  It seems to me that we 
should just put these in the package that is likely to throw them, i.e. 
org.apache.hadoop.yarn.client.api.

A couple documentation nits:
{code}
-   * requested memory/vcore is non-negative and not greater than max
+   * requested memory/vcore is non-negative and not greater than max throws
+   * exception codeInvalidResourceRequestException/code when there is
+   * invalid request
{code}
throws should be on a separate line as @throws

{code}
+  /*
+   * This method will throw codeInvalidResourceBlacklistRequestException
+   * /code If the resource is not be able to add to black list.
+   */
{code}
If the resource is not be able to add to black list. should be if the 
resource is not able to be added to the blacklist.

 Create exceptions package in common/api for yarn and move client facing 
 exceptions to them
 --

 Key: YARN-912
 URL: https://issues.apache.org/jira/browse/YARN-912
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-912-trunk-1.patch


 Exceptions like InvalidResourceBlacklistRequestException, 
 InvalidResourceRequestException, InvalidApplicationMasterRequestException etc 
 are currently inside ResourceManager and not visible to clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-333) Schedulers cannot control the queue-name of an application

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706280#comment-13706280
 ] 

Hudson commented on YARN-333:
-

Integrated in Hadoop-trunk-Commit #4074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4074/])
YARN-333. Schedulers cannot control the queue-name of an application. 
(sandyr via tucu) (Revision 1502374)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502374
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Schedulers cannot control the queue-name of an application
 --

 Key: YARN-333
 URL: https://issues.apache.org/jira/browse/YARN-333
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, 
 YARN-333.patch


 Currently, if an app is submitted without a queue, RMAppManager sets the 
 RMApp's queue to default.
 A scheduler may wish to make its own decision on which queue to place an app 
 in if none is specified. For example, when the fair scheduler 
 user-as-default-queue config option is set to true, and an app is submitted 
 with no queue specified, the fair scheduler should assign the app to a queue 
 with the user's name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706284#comment-13706284
 ] 

Omkar Vinit Joshi commented on YARN-897:


[~dedcode] Thanks for posting the patch... looked at the code.. 

bq. // Can't use childQueues.remove() since the TreeSet might be out of 
order.
any reason for this even after this patch? if we don't see any other issues 
then why not just use childQueues.remove instead of iterating?

* reinsertQueue could be marked synchronized? thoughts? But yeah.. without that 
too it is thread safe as we are locking it at CapacitySchedulder.nodeUpdate(). 
but still it is better to mark it.

* LOG.info(Re-sorting queues since queue got completed:  + 
childQueue.getQueuePath() + 
nit. line  80

* at present we send the container completed event to leaf queue and then keep 
propagating it till root. why not sent the event to root grab the locks from 
root-leaf and update it? any thoughts?

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-916) JobContext cache files api are broken

2013-07-11 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-916:
--

 Summary: JobContext cache files api are broken
 Key: YARN-916
 URL: https://issues.apache.org/jira/browse/YARN-916
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi


I just checked there are issues with latest distributed cache api.
* JobContext.getLocalCacheFiles ... is deprecated.. should not have been 
deprecated.
* JobContext.getCacheFiles is broken returns null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706327#comment-13706327
 ] 

Carlo Curino commented on YARN-897:
---

Omkar, thanks for the quick feedback... 

bq. any reason for this even after this patch? if we don't see any other issues 
then why not just use childQueues.remove instead of iterating?
I initially thought the same, but I worried that since the underlying capacity 
attribute has been changed, the TreeSet is already non-consistent? [~dedcode] 
can you check whether this is true or not? Also can we use some careful 
operation ordering, and get away with Omkar suggestion?

bq. reinsertQueue could be marked synchronized? thoughts? But yeah.. without 
that too it is thread safe as we are locking it at 
CapacitySchedulder.nodeUpdate(). but still it is better to mark it.

We should probably follow your suggestion (especially if this method will be 
reused elsewhere), or at least use the lock annotations properly. (again this 
patch wasn't quite ready)

bq. nit. line  80

will do

bq. at present we send the container completed event to leaf queue and then 
keep propagating it till root. why not sent the event to root grab the locks 
from root-leaf and update it? any thoughts?
Lock ordering is somewhat delicate (and I worry not very consistent). In 
general, the idea to lock bottom up should allow for part of the operations 
(updating of two leaf queues) to be concurrent until the recursion meet at some 
common ancestor, at which point we serialize. However, at least for some of the 
operations this is inside a global scheduler lock, so we loose that benefit in 
the first-place. It might be interesting to review the locks carefully and see 
whether we can rationalize them further. Although this is delicate, and unless 
we are lock-bound on the scheduler in practice would not buy us much.  

We didn't have time to test this through to a level I would be confident PAing 
this. Omkar do you have any cycle to test this? [~acmurthy],[~tgraves] do you 
guys have a moment to review this? 

BTW we are working on a discrete event simulator, which should allow us to 
lock-step/debug the entire RM codebase... that would make for easy testing of 
some of this stuff (more as soon as we get it ready to show it around).

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-07-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706339#comment-13706339
 ] 

Alejandro Abdelnur commented on YARN-366:
-

[~vinodkv], you have been following this one, anything else you think it should 
be addressed before committing? I'd like to get this in 2.1-beta if possible.


 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366-7.patch, 
 YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Djellel Eddine Difallah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706348#comment-13706348
 ] 

Djellel Eddine Difallah commented on YARN-897:
--

Omkar, thanks for the feedback
{quote}any reason for this even after this patch? if we don't see any other 
issues then why not just use childQueues.remove instead of iterating?{quote}
The tree is already out of order because of the new usedCapacity, the remove() 
won't work. We have to iterate and add() to fix the order.
{quote}reinsertQueue could be marked synchronized? thoughts? But yeah.. without 
that too it is thread safe as we are locking it at 
CapacitySchedulder.nodeUpdate(). but still it is better to mark it.{quote}
ok, sounds reasonable to put a synchronize there.
{quote}LOG.info(Re-sorting queues since queue got completed:  + 
childQueue.getQueuePath() +
nit. line  80{quote}
sure
{quote}at present we send the container completed event to leaf queue and then 
keep propagating it till root. why not sent the event to root grab the locks 
from root-leaf and update it? any thoughts?{quote}
Because the released container is linked to a leaf queue and we have to walk 
bottom up to figure out to which parent propagate. The assignment phase, 
however, works the way you described.

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations

2013-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706382#comment-13706382
 ] 

Bikas Saha commented on YARN-521:
-

I have been extremely caught up today. Will try to get to this later tonight or 
tomorrow.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Djellel Eddine Difallah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Djellel Eddine Difallah updated YARN-897:
-

Attachment: YARN-897-2.patch

Patch reflecting Omkar's comments. 1) add synchronized to reinsertQueue 2) 
reduce line length

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
 YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706432#comment-13706432
 ] 

Zhijie Shen commented on YARN-292:
--

{code}
  // Acquire the AM container from the scheduler.
  Allocation amContainerAllocation = appAttempt.scheduler.allocate(
  appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
  EMPTY_CONTAINER_RELEASE_LIST, null, null);
{code}
The above code will eventually pull the newly allocated containers in 
newlyAllocatedContainers.

Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives 
CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during 
ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. 
Therefore, pulling newlyAllocatedContainers happens when RMContainer is at 
ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when 
it is still at NEW. In conclusion, one container in the allocation is expected 
in AMContainerAllocatedTransition.

Hinted by [~nemon], the problem may happen at
{code}
FiCaSchedulerApp application = getApplication(applicationAttemptId);
if (application == null) {
  LOG.error(Calling allocate on removed  +
  or non existant application  + applicationAttemptId);
  return EMPTY_ALLOCATION;
}
{code}
EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be 
inconsistent synchronization on accessing the application map.

Suddenly be aware that [~djp] has started working on this problem. Please feel 
free to take it over. Thanks! 

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706488#comment-13706488
 ] 

Omkar Vinit Joshi commented on YARN-897:


[~dedcode] please do keep older patches... it helps reviewing by sometimes 
diffing against older patches and verifying older comments... Thanks

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-744:
---

Attachment: YARN-744-20130711.1.patch

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Djellel Eddine Difallah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Djellel Eddine Difallah updated YARN-897:
-

Attachment: YARN-897-1.patch

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
 YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-592) Container logs lost for the application when NM gets restarted

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706495#comment-13706495
 ] 

Omkar Vinit Joshi commented on YARN-592:


Just to be sure I might be wrong I am bit skeptical about .tmp file... are 
you sure it contains all the logs? My understanding is that it was still in the 
process and didn't finish with all. However even for completed logs.. it will 
enqueue them into the deletion service for future deletionwhich may or may 
not happen even for graceful shutdown as we kill NM after some time...right? 
thoughts?

bq. This patch is trying to upload logs for the applications which run before 
and after NM restart. If the application gets completed after NM crash and 
before starting NM, atleast logs for the containers ran on that node can get 
from NM local logs dirs.

This seems to be problematic. The time difference between AM finishing and NM 
starting can be as low as sec..or as high as hours.. we need to have definite 
policy for handling logs.. because if we don't handle this logs will be lying 
on nm waiting for already finished app to finish ... right?.. thoughts?

 Container logs lost for the application when NM gets restarted
 --

 Key: YARN-592
 URL: https://issues.apache.org/jira/browse/YARN-592
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.3-alpha
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Critical
 Attachments: YARN-592.patch


 While running a big job if the NM goes down due to some reason and comes 
 back, it will do the log aggregation for the newly launched containers and 
 deletes all the containers for the application. This case we don't get the 
 container logs from HDFS or local for the containers which are launched 
 before restart and completed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi resolved YARN-541.


Resolution: Invalid

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706506#comment-13706506
 ] 

Omkar Vinit Joshi commented on YARN-541:


I am closing this as invalid... please reopen if you still see the issue is 
there...

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reopened YARN-541:
--


[~ojoshi] [~write2kishore] I think [~bikassaha] discovered a race condition in 
the AMRMClient that may be causing this.

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706553#comment-13706553
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

Fundamentally, this JIRA is to track the management of data related to finished 
applications via a new server called ApplicationHistoryService (AHS). Some 
important design points:

h4. Basics
 - ResoureManager will write per-application data to a (hopefully very) thin 
{{HistoryStorage}} layer.
 - ResourceManager will push the data to HistoryStorage after an application 
finishes in a separate thread.
 - HistoryStorage is different from the current RMStateStore and so unlike 
JobHistory, HistoryStorage isn't used for state-tracking or as a transaction 
log. ResourceManager will try to publish information about completed apps in a 
best-case manner but there will be edge cases during RM restart where we may 
not be flushing some data. Fixing it to be consistent and complete over an RM 
restart will be a future step.
 - HistoryStorage will have publish app-info, retrieve app-info and list apps 
APIs and can have various implementations
   -- A file based implementation where RM writes per-app files to DFS, 
HistoryStorage will take care of file management like we do today in 
JobHistoryServer (JHS) and serve users by reading the data in files
   -- A shared bus implementation where RM directly writes to AHS and AHS 
persists them in a storage that it controls - Files/DB etc.
 - To start with, we will have an implementation with per-app HDFS file.

h4. Miscellaneous

 - *Running as service*: By default, ApplicationHistoryService will be embedded 
inside ResourceManager but will be independent enough to run as a separate 
service for scaling purposes.

 - *User interfaces*: Command line clients and/or web-clients will have RPC and 
web and REST interfaces to interact with ApplicationHistoryService to get info 
about finished applications. Fundamentally, we'll have two types of interfaces
-- Per-app info
-- List of all apps
-- Querying list of apps based on user-name, queue-name etc. To start with, 
we will imitate what JHS does, throw up list of all apps and do the filtering 
client side. But we need a better server side solution.

 - *Aggregated logs*: Logs will be served and potentially log management 
(expiry etc.) by ApplicationHistoryService via an abstract LogService component.

 - *Retention*: ApplicationHistoryService will have components to take care of 
retention - expiring very old apps.

 - *Security*: ApplicationHistoryService will have security from start, will 
use tokens similar to JHS.

h4. Out of scope

 - Hosting/serving per-framework data is out of scope for this JIRA. It is 
related to ApplicationHistoryService but I am keeping focus on generic data for 
now on this JIRA, will file a separate ticket for ApplicationHistoryService or 
a related service to work with per-framework or app data. I see a transition 
phase where we would continue to run AHS and JHS run at the same time till the 
other JIRA is resolved.

 - *Long running services*: We won't be having any special support for long 
running services yet. We should track this with other long running services' 
support.

Feedback apprecitated.

I am going kickstarting this right now. I am creating a branch for faster 
progress. 

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706554#comment-13706554
 ] 

Hadoop QA commented on YARN-744:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591936/YARN-744-20130711.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1465//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1465//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1465//console

This message is automatically generated.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706561#comment-13706561
 ] 

Omkar Vinit Joshi commented on YARN-701:


I have checked the patch some comments
* Earlier it was possible even in secured environment to use AMRMToken for 
appAttemptId1 and request containers for appAttemptId2. It is fixed now in 
authorize call for both cases.
* Patch works in secured and unsecured environment.
* It makes sense to remove appAttemptId from request.. thoughts?? backward 
compatibility?
* However there is a problem if we restart node manager on which AM was running 
during application run. Attaching logs.

 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-701:
---

Attachment: yarn-ojoshi-resourcemanager-HW10351.local.log

 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706562#comment-13706562
 ] 

Hadoop QA commented on YARN-701:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591950/yarn-ojoshi-resourcemanager-HW10351.local.log
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1466//console

This message is automatically generated.

 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706579#comment-13706579
 ] 

Junping Du commented on YARN-292:
-

Hi [~zjshen], I think your work above reveal the root cause of this bug. So 
please feel free to go ahead and fix it. I will also help to review it. Thx! 

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706589#comment-13706589
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

I shall try to get you the logs you needed today or as soon as possible and
reopen it.


On Fri, Jul 12, 2013 at 5:49 AM, Omkar Vinit Joshi (JIRA)



 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706601#comment-13706601
 ] 

Hitesh Shah commented on YARN-541:
--

[~write2kishore] if you plan to re-run this to get new logs, could you please 
run the RM and NM with DEBUG log level. Thanks.

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706602#comment-13706602
 ] 

Hitesh Shah commented on YARN-541:
--

Likewise have the AM also run with the debug log level if possible. 

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-541:


Assignee: Omkar Vinit Joshi  (was: Vinod Kumar Vavilapalli)

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-541:


Assignee: Vinod Kumar Vavilapalli  (was: Omkar Vinit Joshi)

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Vinod Kumar Vavilapalli
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706645#comment-13706645
 ] 

Hitesh Shah commented on YARN-321:
--

{quote}
To start with, we will have an implementation with per-app HDFS file.
{quote}

[~vinodkv] Based on the above, it seems like this will address allowing someone 
to analyse only one job at a time. Based on a per-app file, it will be 
non-trivial to search for applications that match a certain criteria? All jobs 
that run on a certain day? All jobs of a certain type? All jobs that took 
longer than 10 mins to run? All jobs that use over 100 containers? Sure, a 
directory hierarchy based on dates may solve the very basic use-cases but it 
looks like anyone needing to do any slightly more complex analysis on cluster 
utilization will need to build an indexing layer on top of the file-based store?




 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706646#comment-13706646
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

Hitesh,
  How can I do  that?





 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706650#comment-13706650
 ] 

Hitesh Shah commented on YARN-541:
--

export HADOOP_ROOT_LOGGER=DEBUG,RFA
export YARN_ROOT_LOGGER=DEBUG,RFA
when starting the RM and NM. 

For the DSShell, you can use --log_properties and pass in a log4j.properties 
which has a hardcoded DEBUG level for the root logger. However, based on what I 
can see, the DS Shell AM at DEBUG level may not be necessary.


 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell

2013-07-11 Thread Abhishek Kapoor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706653#comment-13706653
 ] 

Abhishek Kapoor commented on YARN-816:
--

Preemption is one of the case where container can be killed while application 
is still running.
We can take inspiration from CPU scheduling algorithms done in OS.
Also if application is preempted we can provide a way to let app know that if 
it is going to get preempted and during recovery we aware app then it was bring 
preempted.
Probably a event fired to app letting it know what is going(preempt) to happen 
and what has happened(preempted).

Sorry if it sounds confusing
I am open for discussion


 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706654#comment-13706654
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

Like I mentioned:
bq. Querying list of apps based on user-name, queue-name etc. To start with, we 
will imitate what JHS does, throw up list of all apps and do the filtering 
client side. But we need a better server side solution.
So for both the CLI and web UI, we will start with a client side basic 
filtering, perhaps coupled with paging on the results. More advanced analytics 
needs a more robust server side solution. I can already imagine file-based 
indices, but a more query friendly storage will be needed - a table view via 
HCat/HBase over HDFS will be a good start.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706662#comment-13706662
 ] 

Vinod Kumar Vavilapalli commented on YARN-816:
--

I originally filed this to make DistributedShell AM to recover when the node 
running AM crashes. There are two things it can do
 - Just restart everything from scratch
 - Or remember how many nodes are already taken care of and only run the 
remaining.
 - While we do this, we should generally try to design libraries that help 
other framework writers implement state recovery on AM crash or atleast create 
some conventions.

 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell

2013-07-11 Thread Abhishek Kapoor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706667#comment-13706667
 ] 

Abhishek Kapoor commented on YARN-816:
--

Couldn't agree more [~vinodkv]

We can have state of AM communicated to RM.
When AM boots up, the state from RM should be communicated to AM for example 
whether its a fresh start or a recovery and if its a recovery the state of the 
nodes app was running on, should be communicated to AM by RM. 

The above use case might require communication protocol change between AM and 
RM ..


 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira