[GitHub] [spark] zsxwing commented on pull request #31638: [SPARK-34526][SS] Skip checking glob path in FileStreamSink.hasMetadata

GitBox Fri, 26 Mar 2021 10:59:53 -0700


zsxwing commented on pull request #31638:
URL: https://github.com/apache/spark/pull/31638#issuecomment-808413955



   > When we provide `/output/*` as a glob path on path, what would we expect?
   
   Currently Spark will `Don't leverage metadata in /output/b and read both 
directories via listing.`. Ideally we should respect metadata in each 
directory. But I cannot find a simple way to solve it.
   
   Ideally, we should provide different APIs for normal path and glob path. But 
since we have mixed two different concepts into one parameter, we cannot take 
back it. I'd suggest to set the goal to fix the regression from 3.0: `when a 
glob path is valid but we cannot call getFileStatus with it, how to allow users 
to access batch output.` and avoid changing any other existing behaviors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zsxwing commented on pull request #31638: [SPARK-34526][SS] Skip checking glob path in FileStreamSink.hasMetadata

Reply via email to