Zhijie Shen commented on YARN-2148:

bq. This should be a rare case, but it should be possible. In this case, the 
following assertion will fail as well?

Sorry, I made a mistake on the case. The race condition I've observed before is 
that KillTransition is executed, and the the diagnostics info has been added. 
However, CLEANUP_CONTAINER is executed on another thread. Before it is 
executed, the container has already exit as normal, with the exit code 0.

See the code comment:
          // O is possible if CLEANUP_CONTAINER is executed too late
          // 137 is possible if the container is not terminated but killed

bq. What's the differences between 137 and 143?

I didn't look into the patch of YARN-2091. It seems that we still have 137 and 
143 in ExitCode. We need to make sure the container will not exit with these 
two codes here.

> TestNMClient failed due more exit code values added and passed to AM
> --------------------------------------------------------------------
>                 Key: YARN-2148
>                 URL: https://issues.apache.org/jira/browse/YARN-2148
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.0.0, 2.5.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>             Fix For: 2.5.0
>         Attachments: YARN-2148.patch
> Currently, TestNMClient will be failed in trunk, see 
> https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
> {code}
> java.lang.AssertionError: null
>       at org.junit.Assert.fail(Assert.java:86)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at org.junit.Assert.assertTrue(Assert.java:52)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
> {code}
> Test cases in TestNMClient uses following code to verify exit code of 
> COMPLETED containers
> {code}
>           testGetContainerStatus(container, i, ContainerState.COMPLETE,
>               "Container killed by the ApplicationMaster.", Arrays.asList(
>                   new Integer[] {137, 143, 0}));
> {code}
> But YARN-2091 added logic to make exit code reflecting the actual status, so 
> exit code of the "killed by ApplicationMaster" will be -105,
> {code}
>       if (container.hasDefaultExitCode()) {
>         container.exitCode = exitEvent.getExitCode();
>       }
> {code}
> We should update test case as well.

This message was sent by Atlassian JIRA

Reply via email to