[jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )
[ https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486708#comment-13486708 ] Sandy Ryza commented on YARN-72: Yeah, wanted to get feedback before a refresh, but I'll put in the timeout fix and tests if others think this is a good approach? I had thought that cgroups relied on unix process groups, but searching around now, I couldn't find a connection, so it seems like interfering with YARN-3 wouldn't actually be a problem. The other comments still apply to unix process groups. NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown ) --- Key: YARN-72 URL: https://issues.apache.org/jira/browse/YARN-72 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Sandy Ryza Attachments: YARN-72.patch Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal for existing containers to complete and kill the containers ( if we pick an aggressive approach ) after this time interval. For NMs which come up after an unclean shutdown, the NM should look through its directories for existing container.pids and try and kill an existing containers matching the pids found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-190) Issues when running distributedShell example in hadoop-2.0.1-alpha cluster
xiajunluan created YARN-190: --- Summary: Issues when running distributedShell example in hadoop-2.0.1-alpha cluster Key: YARN-190 URL: https://issues.apache.org/jira/browse/YARN-190 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.1-alpha Environment: ubuntu 11.04 Reporter: xiajunluan Priority: Critical I have successfully run distributed Shell example in single node that deployed the hadoop-2.0.1-alpha. But when I run distributedShell in cluster environment, sometimes it works well, but sometimes it will failed, following is my detail configuration A: NameNode, ResourceManager B: DataNode, NodeManager C: DataNode, NodeManager I run the distributedShell with command “./bin/hadoop jar share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar -shell_command whoami -debug” When application master is launched, it will create the container to run shell command “whoami”, so application master will run on node B or C, and container will also randomly run on B or C, if application master and shell command container run on the same node(for example all on node B), above command will run successfully, but if in different node, that is to say, if application master launched successfully in node B and it create the container that will run on node C, I will receive the error message: ……. 12/10/29 19:18:02 INFO distributedshell.Client: Application did finished unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop 12/10/29 19:18:02 ERROR distributedshell.Client: Application failed to complete successfully” -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-189) deadlock in RM - AMResponse object
[ https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-189: --- Attachment: YARN-189.patch deadlock in RM - AMResponse object -- Key: YARN-189 URL: https://issues.apache.org/jira/browse/YARN-189 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-189.patch we ran into a deadlock in the RM. = 1128743461@qtp-1252749669-5201: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler AsyncDispatcher event handler: waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl), which is held by IPC Server handler 36 on 8030 IPC Server handler 36 on 8030: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler Java stack information for the threads listed above: === 1128743461@qtp-1252749669-5201: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aabbc87b960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2 95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM ... ... .. AsyncDispatcher event handler: at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307) - waiting to lock 0x2aab3d4cd698 (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - locked 0x2aabbb673090 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436) at
[jira] [Updated] (YARN-188) Coverage fixing for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-188: -- Attachment: YARN-188-branch-2.patch YARN-188-branch-0.23.patch Coverage fixing for CapacityScheduler - Key: YARN-188 URL: https://issues.apache.org/jira/browse/YARN-188 Project: Hadoop YARN Issue Type: Test Components: capacityscheduler Reporter: Aleksey Gorshkov Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, YARN-188-trunk.patch some tests for CapacityScheduler YARN-188-branch-0.23-TestCapacityScheduler.patch patch for branch 0.23 YARN-188-branch-2-TestCapacityScheduler.patch patch for branch 2 YARN-188-trunk-TestCapacityScheduler.patch patch for trunk -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-188) Coverage fixing for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-188: -- Description: some tests for CapacityScheduler YARN-188-branch-0.23.patch patch for branch 0.23 YARN-188-branch-2.patch patch for branch 2 YARN-188-trunk.patch patch for trunk was: some tests for CapacityScheduler YARN-188-branch-0.23-TestCapacityScheduler.patch patch for branch 0.23 YARN-188-branch-2-TestCapacityScheduler.patch patch for branch 2 YARN-188-trunk-TestCapacityScheduler.patch patch for trunk Coverage fixing for CapacityScheduler - Key: YARN-188 URL: https://issues.apache.org/jira/browse/YARN-188 Project: Hadoop YARN Issue Type: Test Components: capacityscheduler Reporter: Aleksey Gorshkov Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, YARN-188-trunk.patch some tests for CapacityScheduler YARN-188-branch-0.23.patch patch for branch 0.23 YARN-188-branch-2.patch patch for branch 2 YARN-188-trunk.patch patch for trunk -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-188) Coverage fixing for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-188: -- Attachment: YARN-188-trunk.patch Coverage fixing for CapacityScheduler - Key: YARN-188 URL: https://issues.apache.org/jira/browse/YARN-188 Project: Hadoop YARN Issue Type: Test Components: capacityscheduler Reporter: Aleksey Gorshkov Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, YARN-188-trunk.patch some tests for CapacityScheduler YARN-188-branch-0.23-TestCapacityScheduler.patch patch for branch 0.23 YARN-188-branch-2-TestCapacityScheduler.patch patch for branch 2 YARN-188-trunk-TestCapacityScheduler.patch patch for trunk -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-186) Coverage fixing LinuxContainerExecuror
[ https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-186: -- Attachment: (was: YARN-186-branch-0.23-LinuxContainerExecuror.patch) Coverage fixing LinuxContainerExecuror -- Key: YARN-186 URL: https://issues.apache.org/jira/browse/YARN-186 Project: Hadoop YARN Issue Type: Test Components: resourcemanager, scheduler Reporter: Aleksey Gorshkov Attachments: YARN-186-branch-2-LinuxContainerExecuror.patch, YARN-186-trunk-LinuxContainerExecuror.patch Added some tests for LinuxContainerExecuror YARN-186-branch-0.23-LinuxContainerExecuror patch for branch-0.23 YARN-186-branch-2-LinuxContainerExecuror patch for branch-2 ARN-186-trunk-LinuxContainerExecuror patch for trank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-186) Coverage fixing LinuxContainerExecuror
[ https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-186: -- Attachment: YARN-186-branch-2.patch YARN-186-branch-0.23.patch Coverage fixing LinuxContainerExecuror -- Key: YARN-186 URL: https://issues.apache.org/jira/browse/YARN-186 Project: Hadoop YARN Issue Type: Test Components: resourcemanager, scheduler Reporter: Aleksey Gorshkov Attachments: YARN-186-branch-0.23.patch, YARN-186-branch-2.patch, YARN-186-trunk.patch Added some tests for LinuxContainerExecuror YARN-186-branch-0.23-LinuxContainerExecuror patch for branch-0.23 YARN-186-branch-2-LinuxContainerExecuror patch for branch-2 ARN-186-trunk-LinuxContainerExecuror patch for trank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-186) Coverage fixing LinuxContainerExecuror
[ https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-186: -- Description: Added some tests for LinuxContainerExecuror YARN-186-branch-0.23.patch patch for branch-0.23 YARN-186-branch-2.patch patch for branch-2 ARN-186-trunk.patch patch for trank was: Added some tests for LinuxContainerExecuror YARN-186-branch-0.23-LinuxContainerExecuror patch for branch-0.23 YARN-186-branch-2-LinuxContainerExecuror patch for branch-2 ARN-186-trunk-LinuxContainerExecuror patch for trank Coverage fixing LinuxContainerExecuror -- Key: YARN-186 URL: https://issues.apache.org/jira/browse/YARN-186 Project: Hadoop YARN Issue Type: Test Components: resourcemanager, scheduler Reporter: Aleksey Gorshkov Attachments: YARN-186-branch-0.23.patch, YARN-186-branch-2.patch, YARN-186-trunk.patch Added some tests for LinuxContainerExecuror YARN-186-branch-0.23.patch patch for branch-0.23 YARN-186-branch-2.patch patch for branch-2 ARN-186-trunk.patch patch for trank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-189) deadlock in RM - AMResponse object
[ https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486893#comment-13486893 ] Hadoop QA commented on YARN-189: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551348/YARN-189.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/126//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/126//console This message is automatically generated. deadlock in RM - AMResponse object -- Key: YARN-189 URL: https://issues.apache.org/jira/browse/YARN-189 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-189.patch we ran into a deadlock in the RM. = 1128743461@qtp-1252749669-5201: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler AsyncDispatcher event handler: waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl), which is held by IPC Server handler 36 on 8030 IPC Server handler 36 on 8030: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler Java stack information for the threads listed above: === 1128743461@qtp-1252749669-5201: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aabbc87b960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2 95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM ... ... .. AsyncDispatcher event handler: at
[jira] [Commented] (YARN-189) deadlock in RM - AMResponse object
[ https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487064#comment-13487064 ] Robert Joseph Evans commented on YARN-189: -- The change looks good to me. I don't see any real issues with it. I am a +1 for it. I am not going to check it in to give Vinod and others some time to comment if they want to. deadlock in RM - AMResponse object -- Key: YARN-189 URL: https://issues.apache.org/jira/browse/YARN-189 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-189.patch we ran into a deadlock in the RM. = 1128743461@qtp-1252749669-5201: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler AsyncDispatcher event handler: waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl), which is held by IPC Server handler 36 on 8030 IPC Server handler 36 on 8030: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler Java stack information for the threads listed above: === 1128743461@qtp-1252749669-5201: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aabbc87b960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2 95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM ... ... .. AsyncDispatcher event handler: at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307) - waiting to lock 0x2aab3d4cd698 (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - locked 0x2aabbb673090 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478) at
[jira] [Commented] (YARN-186) Coverage fixing LinuxContainerExecuror
[ https://issues.apache.org/jira/browse/YARN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487071#comment-13487071 ] Hadoop QA commented on YARN-186: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551354/YARN-186-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorExtension {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/128//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/128//console This message is automatically generated. Coverage fixing LinuxContainerExecuror -- Key: YARN-186 URL: https://issues.apache.org/jira/browse/YARN-186 Project: Hadoop YARN Issue Type: Test Components: resourcemanager, scheduler Reporter: Aleksey Gorshkov Attachments: YARN-186-branch-0.23.patch, YARN-186-branch-2.patch, YARN-186-trunk.patch Added some tests for LinuxContainerExecuror YARN-186-branch-0.23.patch patch for branch-0.23 YARN-186-branch-2.patch patch for branch-2 ARN-186-trunk.patch patch for trank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-188) Coverage fixing for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487072#comment-13487072 ] Hadoop QA commented on YARN-188: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551351/YARN-188-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/129//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/129//console This message is automatically generated. Coverage fixing for CapacityScheduler - Key: YARN-188 URL: https://issues.apache.org/jira/browse/YARN-188 Project: Hadoop YARN Issue Type: Test Components: capacityscheduler Reporter: Aleksey Gorshkov Attachments: YARN-188-branch-0.23.patch, YARN-188-branch-2.patch, YARN-188-trunk.patch some tests for CapacityScheduler YARN-188-branch-0.23.patch patch for branch 0.23 YARN-188-branch-2.patch patch for branch 2 YARN-188-trunk.patch patch for trunk -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-189) deadlock in RM - AMResponse object
[ https://issues.apache.org/jira/browse/YARN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487115#comment-13487115 ] Vinod Kumar Vavilapalli commented on YARN-189: -- bq. Thanks for the info Vinod. In an effort to keep changes to a minimum for now, we can remove the synchronized from the unregisterAttempt and in allocate just check to see if it was null when putting it. +1 for the idea, clearly I don't want us to hold this blocker for fixing all these, let's create a separate ticket. Will quickly look at the patch. deadlock in RM - AMResponse object -- Key: YARN-189 URL: https://issues.apache.org/jira/browse/YARN-189 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Attachments: YARN-189.patch we ran into a deadlock in the RM. = 1128743461@qtp-1252749669-5201: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler AsyncDispatcher event handler: waiting to lock monitor 0x2ab0bba3a370 (object 0x2aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl), which is held by IPC Server handler 36 on 8030 IPC Server handler 36 on 8030: waiting for ownable synchronizer 0x2aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by AsyncDispatcher event handler Java stack information for the threads listed above: === 1128743461@qtp-1252749669-5201: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aabbc87b960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2 95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM ... ... .. AsyncDispatcher event handler: at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307) - waiting to lock 0x2aab3d4cd698 (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - locked 0x2aabbb673090 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at
[jira] [Assigned] (YARN-151) Browser thinks RM main page JS is taking too long
[ https://issues.apache.org/jira/browse/YARN-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash reassigned YARN-151: - Assignee: Ravi Prakash Browser thinks RM main page JS is taking too long - Key: YARN-151 URL: https://issues.apache.org/jira/browse/YARN-151 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Ravi Prakash The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-151) Browser thinks RM main page JS is taking too long
[ https://issues.apache.org/jira/browse/YARN-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487317#comment-13487317 ] Ravi Prakash commented on YARN-151: --- @Luke: Even if we used deferred rendering, it seems like we would still need to ship all that data to the browser, even if the user doesn't require it (e.g. the very first page of the JHS) Wouldn't server side be better? http://datatables.net/release-datatables/examples/data_sources/server_side.html It seems to be designed for exactly our use case. Browser thinks RM main page JS is taking too long - Key: YARN-151 URL: https://issues.apache.org/jira/browse/YARN-151 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Ravi Prakash The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-151) Browser thinks RM main page JS is taking too long
[ https://issues.apache.org/jira/browse/YARN-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487337#comment-13487337 ] Luke Lu commented on YARN-151: -- 10k apps is a small amount of data (a few hundred KB to a few MB) and that you don't need implement the logic (api that supports searching/sorting/paging) at server side and that you have better user response time (instant search experience) as the data is at the client side. The server side stuff would be better if we need to support more than 100K apps. If you come up with a patch, I'm happy to review it :) Browser thinks RM main page JS is taking too long - Key: YARN-151 URL: https://issues.apache.org/jira/browse/YARN-151 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Ravi Prakash The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-165) RM should point tracking URL to RM web page for app when AM fails
[ https://issues.apache.org/jira/browse/YARN-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487422#comment-13487422 ] Vinod Kumar Vavilapalli commented on YARN-165: -- Patch looks good. The test too. Any manual tests you have done? I'd expect: - a fresh request to an already dead AM getting directed properly. - an existing page may get a 404 but on refresh should redirect properly. RM should point tracking URL to RM web page for app when AM fails - Key: YARN-165 URL: https://issues.apache.org/jira/browse/YARN-165 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-165.patch Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because http://; isn't a very helpful tracking URL. It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira