[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244524420 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { Review comment: I have already removed that check at https://github.com/apache/spark/pull/23288/commits/239cfa4792b6bebd386e4a962cdf4b0e9b2471aa#diff-7a6cb188d2ae31eb3347b5629a679cecR563 Or are you refering to `checkFilesExist` at line 557 and suggesting removing argument `checkFilesExist` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244504920 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { + val (filtered, filteredOut) = allGlobPath.partition { path => +!InMemoryFileIndex.shouldFilterOut(path.getName) + } + if (filteredOut.nonEmpty) { +if (filtered.isEmpty) { + throw new AnalysisException( +"All path were ignored. The following path were ignored:\n" + + s"${filteredOut.mkString("\n ")}") +} else { + logDebug( +"The following path were ignored:\n" + Review comment: I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244504905 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { + val (filtered, filteredOut) = allGlobPath.partition { path => Review comment: I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244504917 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { + val (filtered, filteredOut) = allGlobPath.partition { path => +!InMemoryFileIndex.shouldFilterOut(path.getName) + } + if (filteredOut.nonEmpty) { +if (filtered.isEmpty) { + throw new AnalysisException( +"All path were ignored. The following path were ignored:\n" + Review comment: I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244504889 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -345,6 +346,47 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te assert(result.schema.fieldNames.size === 1) } + test("SPARK-26339 Debug statement if some of specified paths are filtered out") { +class TestAppender extends AppenderSkeleton { + var events = new java.util.ArrayList[LoggingEvent] + override def close(): Unit = {} + override def requiresLayout: Boolean = false + protected def append(event: LoggingEvent): Unit = events.add(event) +} + +val testAppender1 = new TestAppender +val rootLogger = LogManager.getRootLogger +val origLevel = rootLogger.getLevel +rootLogger.setLevel(Level.DEBUG) +rootLogger.addAppender(testAppender1) +try { + val cars = spark +.read +.format("csv") Review comment: I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244504885 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -345,6 +346,47 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te assert(result.schema.fieldNames.size === 1) } + test("SPARK-26339 Debug statement if some of specified paths are filtered out") { +class TestAppender extends AppenderSkeleton { Review comment: I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244504806 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { Review comment: No need, I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244502755 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { Review comment: I am not sure, but irregular indentations seems to be due to GitHub preview CSS. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r241627197 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -554,7 +554,8 @@ case class DataSource( // Sufficient to check head of the globPath seq for non-glob scenario // Don't need to check once again if files exist in streaming mode - if (checkFilesExist && !fs.exists(globPath.head)) { + if (checkFilesExist && + (!fs.exists(globPath.head) || InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) { Review comment: @srowen I pushed now. Could you please check my commit? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r24033 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -554,7 +554,8 @@ case class DataSource( // Sufficient to check head of the globPath seq for non-glob scenario // Don't need to check once again if files exist in streaming mode - if (checkFilesExist && !fs.exists(globPath.head)) { + if (checkFilesExist && + (!fs.exists(globPath.head) || InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) { Review comment: Thank you for understanding my proposal. Your suggestion looks better,I’ll push later. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r240868108 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -554,7 +554,8 @@ case class DataSource( // Sufficient to check head of the globPath seq for non-glob scenario // Don't need to check once again if files exist in streaming mode - if (checkFilesExist && !fs.exists(globPath.head)) { + if (checkFilesExist && + (!fs.exists(globPath.head) || InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) { Review comment: `InMemoryFileIndex.shouldFilterOut` returns true if argument starts with underscore, so throw a 'Path does not exist' exception. I've checked and exception below was thrown. ``` org.apache.spark.sql.AnalysisException: Path does not exist: file:_test.csv; at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:625) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:478) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org