[ 
https://issues.apache.org/jira/browse/YARN-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147402#comment-16147402
 ] 

Jason Lowe commented on YARN-6479:
----------------------------------

I saw another case of this, and it fails because EntityGroupFSTimelineStore can 
move files from the active to the done directory before the application has 
completed writing the data.  This can occur because the application is in the 
FINISHING state which clients read as the FINISHED state in an app report.  
EntityGroupFSTimelineStore sees the application has finished from the app 
report and assumes the entity files are done being written when in fact the app 
is in the FINISHING state and the AM is still busy writing out the entities to 
HDFS.

The good news is that data isn't lost since HDFS supports renaming of files 
being actively written, but it can cause this unit test to fail since the test 
assumes files in the done directory are complete.  Either we need to fix the 
test to account for this race or we need to fix EntityGroupFSTimelineStore so 
it does not try to move files for applications that are still active.  Fixing 
the latter requires changing the EntityGroupFSTimelineStore to get an 
additional app attempt report on the current attempt and see if it is in a 
terminal state (i.e.: FINISHED, FAILED, KILLED and not FINISHING).  If so then 
this app is really still actively writing entity files and it should not move 
the files from active to done.


> TestDistributedShell.testDSShellWithoutDomainV1_5 fails
> -------------------------------------------------------
>
>                 Key: YARN-6479
>                 URL: https://issues.apache.org/jira/browse/YARN-6479
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Eric Badger
>
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<0>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:385)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV1_5(TestDistributedShell.java:236)
> {noformat}
> This particular run was in 2.8, but may also be present through trunk. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to