[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-30 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244533360
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
 
 Review comment:
   Yes I mean line 557. I guess we can keep that because, overall, we are 
trying to throw `AnalysisException` in more cases, not fewer. Before, if one of 
several glob paths matched no files at all (underscore or not) it would throw. 
OK, that behavior we can keep, I guess, or at least that's a separate question.
   
   Disregard this; I think it is OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244523845
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
 
 Review comment:
   What about removing that check entirely?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244489953
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
+  val (filtered, filteredOut) = allGlobPath.partition { path =>
+!InMemoryFileIndex.shouldFilterOut(path.getName)
+  }
+  if (filteredOut.nonEmpty) {
+if (filtered.isEmpty) {
+  throw new AnalysisException(
+"All path were ignored. The following path were ignored:\n" +
+  s"${filteredOut.mkString("\n  ")}")
+} else {
+  logDebug(
+"The following path were ignored:\n" +
 
 Review comment:
   Nit: for performance, make this one interpolated string. If the line is too 
long make the variable `filteredOut` something shorter like `out`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244489844
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
+  val (filtered, filteredOut) = allGlobPath.partition { path =>
 
 Review comment:
   Nit: I'd call `filtered` as `filteredIn` to avoid ambiguity. It might also 
be very slightly cleaner to avoid the `!` in the expression and flip these two 
values.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244489778
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ##
 @@ -345,6 +346,47 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 assert(result.schema.fieldNames.size === 1)
   }
 
+  test("SPARK-26339 Debug statement if some of specified paths are filtered 
out") {
+class TestAppender extends AppenderSkeleton {
+  var events = new java.util.ArrayList[LoggingEvent]
+  override def close(): Unit = {}
+  override def requiresLayout: Boolean = false
+  protected def append(event: LoggingEvent): Unit = events.add(event)
+}
+
+val testAppender1 = new TestAppender
+val rootLogger = LogManager.getRootLogger
+val origLevel = rootLogger.getLevel
+rootLogger.setLevel(Level.DEBUG)
+rootLogger.addAppender(testAppender1)
+try {
+  val cars = spark
+.read
+.format("csv")
 
 Review comment:
   Nit: you can use `.csv` instead of `format` and `load`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244489914
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
+  val (filtered, filteredOut) = allGlobPath.partition { path =>
+!InMemoryFileIndex.shouldFilterOut(path.getName)
+  }
+  if (filteredOut.nonEmpty) {
+if (filtered.isEmpty) {
+  throw new AnalysisException(
+"All path were ignored. The following path were ignored:\n" +
 
 Review comment:
   path -> paths. Also, it seems clearer to say: "All paths were ignored:\n" 
and below, "Some paths were ignored:\n"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244489734
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ##
 @@ -345,6 +346,47 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 assert(result.schema.fieldNames.size === 1)
   }
 
+  test("SPARK-26339 Debug statement if some of specified paths are filtered 
out") {
+class TestAppender extends AppenderSkeleton {
 
 Review comment:
   I wouldn't bother with this complexity to test if the debug log was printed; 
it's not important compared to the additional binding to log4j.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-29 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r244490086
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -559,6 +559,25 @@ case class DataSource(
   }
   globPath
 }.toSeq
+
+if (checkFilesExist) {
 
 Review comment:
   Do you need to remove the check and exception a few lines above then? It 
would fail if any path didn't have some files. (Also feel free to fix the 
indentation from line 549-558 above)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-12 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r241073780
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -554,7 +554,8 @@ case class DataSource(
 
   // Sufficient to check head of the globPath seq for non-glob scenario
   // Don't need to check once again if files exist in streaming mode
-  if (checkFilesExist && !fs.exists(globPath.head)) {
+  if (checkFilesExist &&
+  (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   I see, I didn't read carefully. This is the new desired behavior. I agree it 
would be better to not end up with an odd CSV parsing message. I wonder if we 
can clarify the message further with a different exception for the new case. 
The path does exist; it's just ignored.
   
   ```
   if (checkFilesExist) {
 val firstPath = globPath.head
 if  (!fs.exists(firstPath)) {
   // ... Path does not exist
 } else if (shouldFilterOut...) {
   // ... Path exists but is ignored
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore

2018-12-11 Thread GitBox
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r240781003
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##
 @@ -554,7 +554,8 @@ case class DataSource(
 
   // Sufficient to check head of the globPath seq for non-glob scenario
   // Don't need to check once again if files exist in streaming mode
-  if (checkFilesExist && !fs.exists(globPath.head)) {
+  if (checkFilesExist &&
+  (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   I'm probably misunderstanding, but doesn't this still cause it to throw a 
'Path does not exist' exception?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org