KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r240868108
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ########## @@ -554,7 +554,8 @@ case class DataSource( // Sufficient to check head of the globPath seq for non-glob scenario // Don't need to check once again if files exist in streaming mode - if (checkFilesExist && !fs.exists(globPath.head)) { + if (checkFilesExist && + (!fs.exists(globPath.head) || InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) { Review comment: `InMemoryFileIndex.shouldFilterOut` returns true if argument starts with underscore, so throw a 'Path does not exist' exception. I've checked and exception below was thrown. ``` org.apache.spark.sql.AnalysisException: Path does not exist: file:_test.csv; at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:625) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:478) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org