[ https://issues.apache.org/jira/browse/YARN-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147402#comment-16147402 ]
Jason Lowe commented on YARN-6479: ---------------------------------- I saw another case of this, and it fails because EntityGroupFSTimelineStore can move files from the active to the done directory before the application has completed writing the data. This can occur because the application is in the FINISHING state which clients read as the FINISHED state in an app report. EntityGroupFSTimelineStore sees the application has finished from the app report and assumes the entity files are done being written when in fact the app is in the FINISHING state and the AM is still busy writing out the entities to HDFS. The good news is that data isn't lost since HDFS supports renaming of files being actively written, but it can cause this unit test to fail since the test assumes files in the done directory are complete. Either we need to fix the test to account for this race or we need to fix EntityGroupFSTimelineStore so it does not try to move files for applications that are still active. Fixing the latter requires changing the EntityGroupFSTimelineStore to get an additional app attempt report on the current attempt and see if it is in a terminal state (i.e.: FINISHED, FAILED, KILLED and not FINISHING). If so then this app is really still actively writing entity files and it should not move the files from active to done. > TestDistributedShell.testDSShellWithoutDomainV1_5 fails > ------------------------------------------------------- > > Key: YARN-6479 > URL: https://issues.apache.org/jira/browse/YARN-6479 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Eric Badger > > {noformat} > java.lang.AssertionError: expected:<2> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:385) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV1_5(TestDistributedShell.java:236) > {noformat} > This particular run was in 2.8, but may also be present through trunk. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org