[
https://issues.apache.org/jira/browse/YARN-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297244#comment-15297244
]
Wangda Tan commented on YARN-5131:
--
[~hitesh], yes you're correct, InterruptedException will not cause AM failure.
Updating title and desc.
The root cause of this issue is because of YARN-1902, YARN scheduler could
allocate more container than required to AM. When AM is finishing when extra
container arrives, container launch will fail because NMClient thread is
interrupted, which causes following check fails:
{code}
if (numFailedContainers.get() == 0 &&
numCompletedContainers.get() == numTotalContainers) {
// SUCCESSFUL
}
{code}
Instead we should deduct failed container from completed containers, uploading
patch.
> Distributed shell AM fails because of InterruptedException
> --
>
> Key: YARN-5131
> URL: https://issues.apache.org/jira/browse/YARN-5131
> Project: Hadoop YARN
> Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>
> DShell AM fails with the following exception
> {code}
> INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org