[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742773#comment-14742773 ] Karthik Kambatla commented on YARN-3697: +1 > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch, YARN-3697.001.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMConta
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708310#comment-14708310 ] zhihai xu commented on YARN-3697: - Hi [~kasha], thanks for the review! That is a good suggestion, I attached a new patch YARN-3697.001.patch which addressed your comments with two tests. Please review it. thanks again! > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch, YARN-3697.001.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager.rm
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707921#comment-14707921 ] Karthik Kambatla commented on YARN-3697: The patch itself looks good. Can we add a unittest to catch any regressions here? For instance, in the future, AsyncDispatcher might stop wrapping InterruptedException in a YarnRuntimeException. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanag
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653042#comment-14653042 ] Hadoop QA commented on YARN-3697: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 4s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748585/YARN-3697.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c3364ca | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8761/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8761/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8761/console | This message was automatically generated. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(St
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579575#comment-14579575 ] Hadoop QA commented on YARN-3697: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 47s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12738645/YARN-3697.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3c2397c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8229/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8229/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8229/console | This message was automatically generated. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTrans
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579428#comment-14579428 ] zhihai xu commented on YARN-3697: - Hi [~rohithsharma], thanks for the review. The ContinuousSchedulingThread is a daemon thread, it won't cause the RM to hang during shutdown. I may cause issue for the test when it tries to stop the scheduler. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578677#comment-14578677 ] Rohith commented on YARN-3697: -- Hi [~zxu], Trying for understanding the problem, Is it ocured when the RM shutdown is called which tries to stop FS service? Does it causing RM to hang during shutdown? > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerS
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578588#comment-14578588 ] Hadoop QA commented on YARN-3697: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 88m 41s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12738526/YARN-3697.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8d0ef31 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8225/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8225/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8225/console | This message was automatically generated. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateM
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578449#comment-14578449 ] zhihai xu commented on YARN-3697: - Updated the patch based on the latest code. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.R
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555126#comment-14555126 ] Hadoop QA commented on YARN-3697: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 49m 59s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 23s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734631/YARN-3697.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 05e04f3 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8047/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8047/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8047/console | This message was automatically generated. > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTrans
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554951#comment-14554951 ] zhihai xu commented on YARN-3697: - I uploaded a patch YARN-3697.000.patch for review. I find a good article at http://www.ibm.com/developerworks/library/j-jtp05236/ ,which talked about InterruptedException > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > -- > > Key: YARN-3697 > URL: https://issues.apache.org/jira/browse/YARN-3697 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3697.000.patch > > > FairScheduler: ContinuousSchedulingThread can't be shutdown after stop > sometimes. > The reason is because the InterruptedException is blocked in > continuousSchedulingAttempt > {code} > try { > if (node != null && Resources.fitsIn(minimumAllocation, > node.getAvailableResource())) { > attemptScheduling(node); > } > } catch (Throwable ex) { > LOG.error("Error while attempting scheduling for node " + node + > ": " + ex.toString(), ex); > } > {code} > I saw the following exception after stop: > {code} > 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] > fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - > Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 > available= used=: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(R