[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244524420
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
 
 Review comment:
   I have already removed that check at 

https://github.com/apache/spark/pull/23288/commits/239cfa4792b6bebd386e4a962cdf4b0e9b2471aa#diff-7a6cb188d2ae31eb3347b5629a679cecR563
   
   Or are you refering to `checkFilesExist` at  line 557 and suggesting 
removing argument `checkFilesExist` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244504920
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
+  val (filtered, filteredOut) = allGlobPath.partition { path =>
+!InMemoryFileIndex.shouldFilterOut(path.getName)
+  }
+  if (filteredOut.nonEmpty) {
+if (filtered.isEmpty) {
+  throw new AnalysisException(
+"All path were ignored. The following path were ignored:\n" +
+  s"${filteredOut.mkString("\n  ")}")
+} else {
+  logDebug(
+"The following path were ignored:\n" +
 
 Review comment:
   I fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244504905
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
+  val (filtered, filteredOut) = allGlobPath.partition { path =>
 
 Review comment:
   I fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244504917
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
+  val (filtered, filteredOut) = allGlobPath.partition { path =>
+!InMemoryFileIndex.shouldFilterOut(path.getName)
+  }
+  if (filteredOut.nonEmpty) {
+if (filtered.isEmpty) {
+  throw new AnalysisException(
+"All path were ignored. The following path were ignored:\n" +
 
 Review comment:
   I fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244504889
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ##
 @@ -345,6 +346,47 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 assert(result.schema.fieldNames.size === 1)
   }
 
+  test("SPARK-26339 Debug statement if some of specified paths are filtered 
out") {
+class TestAppender extends AppenderSkeleton {
+  var events = new java.util.ArrayList[LoggingEvent]
+  override def close(): Unit = {}
+  override def requiresLayout: Boolean = false
+  protected def append(event: LoggingEvent): Unit = events.add(event)
+}
+
+val testAppender1 = new TestAppender
+val rootLogger = LogManager.getRootLogger
+val origLevel = rootLogger.getLevel
+rootLogger.setLevel(Level.DEBUG)
+rootLogger.addAppender(testAppender1)
+try {
+  val cars = spark
+.read
+.format("csv")
 
 Review comment:
   I fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244504885
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ##
 @@ -345,6 +346,47 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 assert(result.schema.fieldNames.size === 1)
   }
 
+  test("SPARK-26339 Debug statement if some of specified paths are filtered 
out") {
+class TestAppender extends AppenderSkeleton {
 
 Review comment:
   I fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244504806
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
 
 Review comment:
   No need, I fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244502755
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
 
 Review comment:
   I am not sure, but irregular indentations seems to be due to GitHub preview 
CSS. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-13 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r241627197
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -554,7 +554,8 @@ case class DataSource(
 
   // Sufficient to check head of the globPath seq for non-glob scenario
   // Don't need to check once again if files exist in streaming mode
-  if (checkFilesExist && !fs.exists(globPath.head)) {
+  if (checkFilesExist &&
+  (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   @srowen I pushed now.  Could you please check my commit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-12 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r24033
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -554,7 +554,8 @@ case class DataSource(
 
   // Sufficient to check head of the globPath seq for non-glob scenario
   // Don't need to check once again if files exist in streaming mode
-  if (checkFilesExist && !fs.exists(globPath.head)) {
+  if (checkFilesExist &&
+  (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   Thank you for understanding my proposal.
   Your suggestion looks better,I’ll push later.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-11 Thread GitBox
KeiichiHirobe commented on a change in pull request #23288: 
[SPARK-26339][SQL]Throws better exception when reading files that start with 
underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r240868108
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -554,7 +554,8 @@ case class DataSource(
 
   // Sufficient to check head of the globPath seq for non-glob scenario
   // Don't need to check once again if files exist in streaming mode
-  if (checkFilesExist && !fs.exists(globPath.head)) {
+  if (checkFilesExist &&
+  (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   `InMemoryFileIndex.shouldFilterOut` returns true if argument starts with 
underscore, so throw a 'Path does not exist' exception. I've checked and 
exception below was thrown.
   
   ```
   org.apache.spark.sql.AnalysisException: Path does not exist: file:_test.csv;
 at 
org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558)
 at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
 at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
 at scala.collection.immutable.List.flatMap(List.scala:355)
 at 
org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
 at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:625)
 at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:478)
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org