KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r240868108
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##########
 @@ -554,7 +554,8 @@ case class DataSource(
 
       // Sufficient to check head of the globPath seq for non-glob scenario
       // Don't need to check once again if files exist in streaming mode
-      if (checkFilesExist && !fs.exists(globPath.head)) {
+      if (checkFilesExist &&
+          (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   `InMemoryFileIndex.shouldFilterOut` returns true if argument starts with 
underscore, so throw a 'Path does not exist' exception. I've checked and 
exception below was thrown.
   
   ```
   org.apache.spark.sql.AnalysisException: Path does not exist: file:_test.csv;
     at 
org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558)
     at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
     at scala.collection.immutable.List.foreach(List.scala:392)
     at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
     at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
     at scala.collection.immutable.List.flatMap(List.scala:355)
     at 
org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545)
     at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
     at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
     at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:625)
     at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:478)
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to