sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r502012000



##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
##########
@@ -214,9 +214,9 @@ class FileIndexSuite extends SharedSparkSession {
               assert(leafFiles.isEmpty)
             } else {
               assert(raceCondition == classOf[FileDeletionRaceFileSystem])
-              // One of the two leaf files was missing, but we should still 
list the other:
-              assert(leafFiles.size == 1)
-              assert(leafFiles.head.getPath == nonDeletedLeafFilePath)
+              // listLocatedStatus will fail as a whole because the default 
impl calls
+              // getFileBlockLocations
+              assert(leafFiles.isEmpty)

Review comment:
       Yes this test checks the case where a file was deleted after a 
`listStatus` call but before a subsequent `getFileBlockLocations` when locality 
info is needed. With the new impl, we'd call `listLocatedStatus` instead which 
will call `getFileBlockLocations` internally, and thus the `listLocatedStatus` 
call (as a whole) fails with `FileNotFoundException`. 
   
   As explained in the PR description, the behavior will be different when 
`spark.sql.files.ignoreMissingFiles` is set, although I think we currently 
don't give any guarantee when there is missing files during listing, so either 
is acceptable? anyway, I'm happy to remove this change if there is any concern. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to