[
https://issues.apache.org/jira/browse/YARN-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147402#comment-16147402
]
Jason Lowe commented on YARN-6479:
----------------------------------
I saw another case of this, and it fails because EntityGroupFSTimelineStore can
move files from the active to the done directory before the application has
completed writing the data. This can occur because the application is in the
FINISHING state which clients read as the FINISHED state in an app report.
EntityGroupFSTimelineStore sees the application has finished from the app
report and assumes the entity files are done being written when in fact the app
is in the FINISHING state and the AM is still busy writing out the entities to
HDFS.
The good news is that data isn't lost since HDFS supports renaming of files
being actively written, but it can cause this unit test to fail since the test
assumes files in the done directory are complete. Either we need to fix the
test to account for this race or we need to fix EntityGroupFSTimelineStore so
it does not try to move files for applications that are still active. Fixing
the latter requires changing the EntityGroupFSTimelineStore to get an
additional app attempt report on the current attempt and see if it is in a
terminal state (i.e.: FINISHED, FAILED, KILLED and not FINISHING). If so then
this app is really still actively writing entity files and it should not move
the files from active to done.
> TestDistributedShell.testDSShellWithoutDomainV1_5 fails
> -------------------------------------------------------
>
> Key: YARN-6479
> URL: https://issues.apache.org/jira/browse/YARN-6479
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.8.0
> Reporter: Eric Badger
>
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:385)
> at
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV1_5(TestDistributedShell.java:236)
> {noformat}
> This particular run was in 2.8, but may also be present through trunk.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]