[
https://issues.apache.org/jira/browse/YARN-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856515#comment-13856515
]
Liyin Liang commented on YARN-1533:
-----------------------------------
During TestDistributedShell.testDSShell(), the ApplicationMaster only ask for
two containers from RM with following code:
{code}
// Setup ask for containers from RM
// Send request for containers to RM
// Until we get our fully allocated quota, we keep on polling RM for
// containers
// Keep looping until all the containers are launched and shell script
// executed on them ( regardless of success/failure).
for (int i = 0; i < numTotalContainers; ++i) {
ContainerRequest containerAsk = setupContainerAskForRM();
amRMClient.addContainerRequest(containerAsk);
}
{code}
But sometimes the app allocated three containers. Here is the callback handler
log:
{code}
2013-12-20 16:44:21,327 INFO [AMRM Callback Handler Thread]
distributedshell.ApplicationMaster
(ApplicationMaster.java:onContainersAllocated(638)) - Got response from RM for
container ask, allocatedCnt=1
2013-12-20 16:44:22,342 INFO [AMRM Callback Handler Thread]
distributedshell.ApplicationMaster
(ApplicationMaster.java:onContainersCompleted(582)) - Got response from RM for
container ask, completedCnt=1
2013-12-20 16:44:22,343 INFO [AMRM Callback Handler Thread]
distributedshell.ApplicationMaster
(ApplicationMaster.java:onContainersAllocated(638)) - Got response from RM for
container ask, allocatedCnt=2
2013-12-20 16:44:23,345 INFO [AMRM Callback Handler Thread]
distributedshell.ApplicationMaster
(ApplicationMaster.java:onContainersCompleted(582)) - Got response from RM for
container ask, completedCnt=2
{code}
In this case, the DistributedShell App needs more time to finish and may time
out.
> TestDistributedShell.testDSShell occasionally fails
> ---------------------------------------------------
>
> Key: YARN-1533
> URL: https://issues.apache.org/jira/browse/YARN-1533
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Liyin Liang
>
> TestApplicationCleanup is occasionally failing with the error:
> {code}
> -------------------------------------------------------------------------------
> Test set:
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 114.163 sec
> <<< FAILURE! - in
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
> testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
> Time elapsed: 90.009 sec <<< ERROR!
> java.lang.Exception: test timed out after 90000 milliseconds
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1186)
> at java.lang.Thread.join(Thread.java:1239)
> at
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:163)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)