srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws 
better exception when reading files that start with underscore
URL: https://github.com/apache/spark/pull/23288#discussion_r241073780
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ##########
 @@ -554,7 +554,8 @@ case class DataSource(
 
       // Sufficient to check head of the globPath seq for non-glob scenario
       // Don't need to check once again if files exist in streaming mode
-      if (checkFilesExist && !fs.exists(globPath.head)) {
+      if (checkFilesExist &&
+          (!fs.exists(globPath.head) || 
InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
 
 Review comment:
   I see, I didn't read carefully. This is the new desired behavior. I agree it 
would be better to not end up with an odd CSV parsing message. I wonder if we 
can clarify the message further with a different exception for the new case. 
The path does exist; it's just ignored.
   
   ```
   if (checkFilesExist) {
     val firstPath = globPath.head
     if  (!fs.exists(firstPath)) {
       // ... Path does not exist
     } else if (shouldFilterOut...) {
       // ... Path exists but is ignored
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to