[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029504#comment-14029504
 ] 

Tsuyoshi OZAWA commented on YARN-2148:
--------------------------------------

{quote}
The race condition I've observed before is that KillTransition is executed, and 
the the diagnostics info has been added. However, CLEANUP_CONTAINER is executed 
on another thread. Before it is executed, the container has already exit as 
normal, with the exit code 0
{quote}

This was a race condition between a thread which executes CLEANUP_CONTAINER and 
ContainerLauncher and KillTransition. {{ContainerImpl#exitCode}} is set in 
{{KillTransition}} after YARN-2091. Therefore, the case of the exit code 0 
doesn't occur and it's also covered with the [~leftnoteasy]'s patch. I think 
it's consistent change.

{quote}
One more concern: ContainerExitStatus is a pubic class. YARN-2091 seems to be 
incompatible change, while the old code has been used for a while.
{quote}

YARN-2091 introduces new ContainerExitStatus. If old code uses old jar before 
YARN-2091, new exit code should be handled as INVALID or unknown exit code. 
IHMO, we should announce it for YARN application creators at the release time. 
One option is adding document which describe this.

> TestNMClient failed due more exit code values added and passed to AM
> --------------------------------------------------------------------
>
>                 Key: YARN-2148
>                 URL: https://issues.apache.org/jira/browse/YARN-2148
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.0.0, 2.5.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>             Fix For: 2.5.0
>
>         Attachments: YARN-2148.patch
>
>
> Currently, TestNMClient will be failed in trunk, see 
> https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
> {code}
> java.lang.AssertionError: null
>       at org.junit.Assert.fail(Assert.java:86)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at org.junit.Assert.assertTrue(Assert.java:52)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
> {code}
> Test cases in TestNMClient uses following code to verify exit code of 
> COMPLETED containers
> {code}
>           testGetContainerStatus(container, i, ContainerState.COMPLETE,
>               "Container killed by the ApplicationMaster.", Arrays.asList(
>                   new Integer[] {137, 143, 0}));
> {code}
> But YARN-2091 added logic to make exit code reflecting the actual status, so 
> exit code of the "killed by ApplicationMaster" will be -105,
> {code}
>       if (container.hasDefaultExitCode()) {
>         container.exitCode = exitEvent.getExitCode();
>       }
> {code}
> We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to