[jira] [Commented] (YARN-5131) Distributed shell AM fails because of InterruptedException

2016-05-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297244#comment-15297244
 ] 

Wangda Tan commented on YARN-5131:
--

[~hitesh], yes you're correct, InterruptedException will not cause AM failure. 
Updating title and desc.

The root cause of this issue is because of YARN-1902, YARN scheduler could 
allocate more container than required to AM. When AM is finishing when extra 
container arrives, container launch will fail because NMClient thread is 
interrupted, which causes following check fails:
{code}
if (numFailedContainers.get() == 0 &&
numCompletedContainers.get() == numTotalContainers) {
// SUCCESSFUL
}
{code}

Instead we should deduct failed container from completed containers, uploading 
patch.


> Distributed shell AM fails because of InterruptedException
> --
>
> Key: YARN-5131
> URL: https://issues.apache.org/jira/browse/YARN-5131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>
> DShell AM fails with the following exception
> {code}
> INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5131) Distributed shell AM fails because of InterruptedException

2016-05-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297174#comment-15297174
 ] 

Hitesh Shah commented on YARN-5131:
---

The error in the description is not really an error. The thread was 
interrupted. 

> Distributed shell AM fails because of InterruptedException
> --
>
> Key: YARN-5131
> URL: https://issues.apache.org/jira/browse/YARN-5131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>
> DShell AM fails with the following exception
> {code}
> INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org