[jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )

2012-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486708#comment-13486708
 ] 

Sandy Ryza commented on YARN-72:


Yeah, wanted to get feedback before a refresh, but I'll put in the timeout fix 
and tests if others think this is a good approach?

I had thought that cgroups relied on unix process groups, but searching around 
now, I couldn't find a connection, so it seems like interfering with YARN-3 
wouldn't actually be a problem.  The other comments still apply to unix process 
groups.

 NM should handle cleaning up containers when it shuts down ( and kill 
 containers from an earlier instance when it comes back up after an unclean 
 shutdown )
 ---

 Key: YARN-72
 URL: https://issues.apache.org/jira/browse/YARN-72
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Sandy Ryza
 Attachments: YARN-72.patch


 Ideally, the NM should wait for a limited amount of time when it gets a 
 shutdown signal for existing containers to complete and kill the containers ( 
 if we pick an aggressive approach ) after this time interval. 
 For NMs which come up after an unclean shutdown, the NM should look through 
 its directories for existing container.pids and try and kill an existing 
 containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-190) Issues when running distributedShell example in hadoop-2.0.1-alpha cluster

2012-10-30 Thread xiajunluan (JIRA)
xiajunluan created YARN-190:
---

 Summary: Issues when running distributedShell example in 
hadoop-2.0.1-alpha cluster
 Key: YARN-190
 URL: https://issues.apache.org/jira/browse/YARN-190
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.0.1-alpha
 Environment: ubuntu 11.04
Reporter: xiajunluan
Priority: Critical


  I have successfully run distributed Shell example in single node that 
deployed the hadoop-2.0.1-alpha.
  But when I run distributedShell in cluster environment, sometimes it works 
well, but sometimes it will failed, following is my detail configuration
 
 A: NameNode, ResourceManager
 B: DataNode, NodeManager
 C: DataNode, NodeManager

   I run the distributedShell with command 
“./bin/hadoop jar 
share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar
 org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar
 -shell_command whoami  -debug”
   
   When application master is launched, it will create the container to run 
shell command “whoami”, so application master will run on node B or C, and 
container will also randomly run on B or C, if application master and shell 
command container run on the same node(for example all on node B), above 
command will run successfully, but if in different node, that is to say, if 
application master launched successfully in node B and it create the container 
that will run on node C, I will receive the error message:

…….
12/10/29 19:18:02 INFO distributedshell.Client: Application did finished 
unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
loop
12/10/29 19:18:02 ERROR distributedshell.Client: Application failed to complete 
successfully”


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-189) deadlock in RM - AMResponse object

2012-10-30 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-189:
---

Attachment: YARN-189.patch

 deadlock in RM - AMResponse object
 --

 Key: YARN-189
 URL: https://issues.apache.org/jira/browse/YARN-189
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Critical
 Attachments: YARN-189.patch


 we ran into a deadlock in the RM.
 =
 1128743461@qtp-1252749669-5201:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 AsyncDispatcher event handler:
   waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
   which is held by IPC Server handler 36 on 8030
 IPC Server handler 36 on 8030:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 Java stack information for the threads listed above:
 ===
 1128743461@qtp-1252749669-5201:
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aabbc87b960 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
 95)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
 at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
 ...
 ...
 ..
   
 AsyncDispatcher event handler:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
 - waiting to lock 0x2aab3d4cd698 (a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 - locked 0x2aabbb673090 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
 at 
 

[jira] [Updated] (YARN-188) Coverage fixing for CapacityScheduler

2012-10-30 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-188:
--

Attachment: YARN-188-branch-2.patch
YARN-188-branch-0.23.patch

 Coverage fixing for CapacityScheduler
 -

 Key: YARN-188
 URL: https://issues.apache.org/jira/browse/YARN-188
 Project: Hadoop YARN
  Issue Type: Test
  Components: capacityscheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, 
 YARN-188-trunk.patch


 some tests for CapacityScheduler
 YARN-188-branch-0.23-TestCapacityScheduler.patch patch for branch 0.23
 YARN-188-branch-2-TestCapacityScheduler.patch patch for branch 2
 YARN-188-trunk-TestCapacityScheduler.patch  patch for trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-188) Coverage fixing for CapacityScheduler

2012-10-30 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-188:
--

Description: 
some tests for CapacityScheduler
YARN-188-branch-0.23.patch patch for branch 0.23
YARN-188-branch-2.patch patch for branch 2
YARN-188-trunk.patch  patch for trunk



  was:
some tests for CapacityScheduler
YARN-188-branch-0.23-TestCapacityScheduler.patch patch for branch 0.23
YARN-188-branch-2-TestCapacityScheduler.patch patch for branch 2
YARN-188-trunk-TestCapacityScheduler.patch  patch for trunk




 Coverage fixing for CapacityScheduler
 -

 Key: YARN-188
 URL: https://issues.apache.org/jira/browse/YARN-188
 Project: Hadoop YARN
  Issue Type: Test
  Components: capacityscheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, 
 YARN-188-trunk.patch


 some tests for CapacityScheduler
 YARN-188-branch-0.23.patch patch for branch 0.23
 YARN-188-branch-2.patch patch for branch 2
 YARN-188-trunk.patch  patch for trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-188) Coverage fixing for CapacityScheduler

2012-10-30 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-188:
--

Attachment: YARN-188-trunk.patch

 Coverage fixing for CapacityScheduler
 -

 Key: YARN-188
 URL: https://issues.apache.org/jira/browse/YARN-188
 Project: Hadoop YARN
  Issue Type: Test
  Components: capacityscheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, 
 YARN-188-trunk.patch


 some tests for CapacityScheduler
 YARN-188-branch-0.23-TestCapacityScheduler.patch patch for branch 0.23
 YARN-188-branch-2-TestCapacityScheduler.patch patch for branch 2
 YARN-188-trunk-TestCapacityScheduler.patch  patch for trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-186) Coverage fixing LinuxContainerExecuror

2012-10-30 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-186:
--

Attachment: (was: YARN-186-branch-0.23-LinuxContainerExecuror.patch)

 Coverage fixing LinuxContainerExecuror
 --

 Key: YARN-186
 URL: https://issues.apache.org/jira/browse/YARN-186
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager, scheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-186-branch-2-LinuxContainerExecuror.patch, 
 YARN-186-trunk-LinuxContainerExecuror.patch


 Added some tests for LinuxContainerExecuror  
 YARN-186-branch-0.23-LinuxContainerExecuror patch for branch-0.23
 YARN-186-branch-2-LinuxContainerExecuror patch for branch-2
 ARN-186-trunk-LinuxContainerExecuror patch for trank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-186) Coverage fixing LinuxContainerExecuror

2012-10-30 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-186:
--

Attachment: YARN-186-branch-2.patch
YARN-186-branch-0.23.patch

 Coverage fixing LinuxContainerExecuror
 --

 Key: YARN-186
 URL: https://issues.apache.org/jira/browse/YARN-186
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager, scheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-186-branch-0.23.patch, YARN-186-branch-2.patch, 
 YARN-186-trunk.patch


 Added some tests for LinuxContainerExecuror  
 YARN-186-branch-0.23-LinuxContainerExecuror patch for branch-0.23
 YARN-186-branch-2-LinuxContainerExecuror patch for branch-2
 ARN-186-trunk-LinuxContainerExecuror patch for trank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-186) Coverage fixing LinuxContainerExecuror

2012-10-30 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-186:
--

Description: 
Added some tests for LinuxContainerExecuror  
YARN-186-branch-0.23.patch patch for branch-0.23
YARN-186-branch-2.patch patch for branch-2
ARN-186-trunk.patch patch for trank


  was:
Added some tests for LinuxContainerExecuror  
YARN-186-branch-0.23-LinuxContainerExecuror patch for branch-0.23
YARN-186-branch-2-LinuxContainerExecuror patch for branch-2
ARN-186-trunk-LinuxContainerExecuror patch for trank



 Coverage fixing LinuxContainerExecuror
 --

 Key: YARN-186
 URL: https://issues.apache.org/jira/browse/YARN-186
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager, scheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-186-branch-0.23.patch, YARN-186-branch-2.patch, 
 YARN-186-trunk.patch


 Added some tests for LinuxContainerExecuror  
 YARN-186-branch-0.23.patch patch for branch-0.23
 YARN-186-branch-2.patch patch for branch-2
 ARN-186-trunk.patch patch for trank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-189) deadlock in RM - AMResponse object

2012-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486893#comment-13486893
 ] 

Hadoop QA commented on YARN-189:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12551348/YARN-189.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/126//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/126//console

This message is automatically generated.

 deadlock in RM - AMResponse object
 --

 Key: YARN-189
 URL: https://issues.apache.org/jira/browse/YARN-189
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Critical
 Attachments: YARN-189.patch


 we ran into a deadlock in the RM.
 =
 1128743461@qtp-1252749669-5201:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 AsyncDispatcher event handler:
   waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
   which is held by IPC Server handler 36 on 8030
 IPC Server handler 36 on 8030:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 Java stack information for the threads listed above:
 ===
 1128743461@qtp-1252749669-5201:
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aabbc87b960 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
 95)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
 at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
 ...
 ...
 ..
   
 AsyncDispatcher event handler:
 at 
 

[jira] [Commented] (YARN-189) deadlock in RM - AMResponse object

2012-10-30 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487064#comment-13487064
 ] 

Robert Joseph Evans commented on YARN-189:
--

The change looks good to me. I don't see any real issues with it.  I am a +1 
for it.  I am not going to check it in to give Vinod and others some time to 
comment if they want to.  

 deadlock in RM - AMResponse object
 --

 Key: YARN-189
 URL: https://issues.apache.org/jira/browse/YARN-189
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Critical
 Attachments: YARN-189.patch


 we ran into a deadlock in the RM.
 =
 1128743461@qtp-1252749669-5201:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 AsyncDispatcher event handler:
   waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
   which is held by IPC Server handler 36 on 8030
 IPC Server handler 36 on 8030:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 Java stack information for the threads listed above:
 ===
 1128743461@qtp-1252749669-5201:
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aabbc87b960 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
 95)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
 at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
 ...
 ...
 ..
   
 AsyncDispatcher event handler:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
 - waiting to lock 0x2aab3d4cd698 (a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 - locked 0x2aabbb673090 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
 at 
 

[jira] [Commented] (YARN-186) Coverage fixing LinuxContainerExecuror

2012-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487071#comment-13487071
 ] 

Hadoop QA commented on YARN-186:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12551354/YARN-186-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorExtension

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/128//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/128//console

This message is automatically generated.

 Coverage fixing LinuxContainerExecuror
 --

 Key: YARN-186
 URL: https://issues.apache.org/jira/browse/YARN-186
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager, scheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-186-branch-0.23.patch, YARN-186-branch-2.patch, 
 YARN-186-trunk.patch


 Added some tests for LinuxContainerExecuror  
 YARN-186-branch-0.23.patch patch for branch-0.23
 YARN-186-branch-2.patch patch for branch-2
 ARN-186-trunk.patch patch for trank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-188) Coverage fixing for CapacityScheduler

2012-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487072#comment-13487072
 ] 

Hadoop QA commented on YARN-188:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12551351/YARN-188-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/129//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/129//console

This message is automatically generated.

 Coverage fixing for CapacityScheduler
 -

 Key: YARN-188
 URL: https://issues.apache.org/jira/browse/YARN-188
 Project: Hadoop YARN
  Issue Type: Test
  Components: capacityscheduler
Reporter: Aleksey Gorshkov
 Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, 
 YARN-188-trunk.patch


 some tests for CapacityScheduler
 YARN-188-branch-0.23.patch patch for branch 0.23
 YARN-188-branch-2.patch patch for branch 2
 YARN-188-trunk.patch  patch for trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-189) deadlock in RM - AMResponse object

2012-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487115#comment-13487115
 ] 

Vinod Kumar Vavilapalli commented on YARN-189:
--

bq. Thanks for the info Vinod. In an effort to keep changes to a minimum for 
now, we can remove the synchronized from the unregisterAttempt and in allocate 
just check to see if it was null when putting it.
+1 for the idea, clearly I don't want us to hold this blocker for fixing all 
these, let's create a separate ticket.

Will quickly look at the patch.

 deadlock in RM - AMResponse object
 --

 Key: YARN-189
 URL: https://issues.apache.org/jira/browse/YARN-189
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Critical
 Attachments: YARN-189.patch


 we ran into a deadlock in the RM.
 =
 1128743461@qtp-1252749669-5201:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 AsyncDispatcher event handler:
   waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
   which is held by IPC Server handler 36 on 8030
 IPC Server handler 36 on 8030:
   waiting for ownable synchronizer 0x2aabbc87b960, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by AsyncDispatcher event handler
 Java stack information for the threads listed above:
 ===
 1128743461@qtp-1252749669-5201:
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aabbc87b960 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
 95)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
 at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
 ...
 ...
 ..
   
 AsyncDispatcher event handler:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
 - waiting to lock 0x2aab3d4cd698 (a 
 org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 - locked 0x2aabbb673090 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
 at 
 

[jira] [Assigned] (YARN-151) Browser thinks RM main page JS is taking too long

2012-10-30 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash reassigned YARN-151:
-

Assignee: Ravi Prakash

 Browser thinks RM main page JS is taking too long
 -

 Key: YARN-151
 URL: https://issues.apache.org/jira/browse/YARN-151
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash

 The main RM page with the default settings of 10,000 applications can cause 
 browsers to think that the JS on the page is stuck and ask you if you want to 
 kill it.  This is a big usability problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-151) Browser thinks RM main page JS is taking too long

2012-10-30 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487317#comment-13487317
 ] 

Ravi Prakash commented on YARN-151:
---

@Luke: Even if we used deferred rendering, it seems like we would still need to 
ship all that data to the browser, even if the user doesn't require it (e.g. 
the very first page of the JHS)

Wouldn't server side be better? 
http://datatables.net/release-datatables/examples/data_sources/server_side.html 
It seems to be designed for exactly our use case.

 Browser thinks RM main page JS is taking too long
 -

 Key: YARN-151
 URL: https://issues.apache.org/jira/browse/YARN-151
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash

 The main RM page with the default settings of 10,000 applications can cause 
 browsers to think that the JS on the page is stuck and ask you if you want to 
 kill it.  This is a big usability problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-151) Browser thinks RM main page JS is taking too long

2012-10-30 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487337#comment-13487337
 ] 

Luke Lu commented on YARN-151:
--

10k apps is a small amount of data (a few hundred KB to a few MB) and that you 
don't need implement the logic (api that supports searching/sorting/paging) at 
server side and that you have better user response time (instant search 
experience) as the data is at the client side. The server side stuff would be 
better if we need to support more than 100K apps. If you come up with a patch, 
I'm happy to review it :)

 Browser thinks RM main page JS is taking too long
 -

 Key: YARN-151
 URL: https://issues.apache.org/jira/browse/YARN-151
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash

 The main RM page with the default settings of 10,000 applications can cause 
 browsers to think that the JS on the page is stuck and ask you if you want to 
 kill it.  This is a big usability problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-165) RM should point tracking URL to RM web page for app when AM fails

2012-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487422#comment-13487422
 ] 

Vinod Kumar Vavilapalli commented on YARN-165:
--

Patch looks good. The test too.

Any manual tests you have done? I'd expect:
 - a fresh request to an already dead AM getting directed properly.
 - an existing page may get a 404 but on refresh should redirect properly.

 RM should point tracking URL to RM web page for app when AM fails
 -

 Key: YARN-165
 URL: https://issues.apache.org/jira/browse/YARN-165
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-165.patch


 Currently when an ApplicationMaster fails the ResourceManager is updating the 
 tracking URL to an empty string, see 
 RMAppAttemptImpl.ContainerFinishedTransition.  Unfortunately when the client 
 attempts to follow the proxy URL it results in a web page showing an HTTP 500 
 error and an ugly backtrace because http://; isn't a very helpful tracking 
 URL.
 It would be much more helpful if the proxy URL redirected to the RM webapp 
 page for the specific application.  That page shows the various AM attempts 
 and pointers to their logs which will be useful for debugging the problems 
 that caused the AM attempts to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira