xuanyuanking opened a new pull request #31638:
URL: https://github.com/apache/spark/pull/31638


   ### What changes were proposed in this pull request?
   - Add a flag `spark.sql.streaming.fileSink.formatCheck.enabled` to skip the 
format check for file streaming sink. This can be turned off when the user 
wants to read the directory as a batch output.
   - When checking a glob path throws an exception, we will assume the user 
wants to read a batch output.
   
   ### Why are the changes needed?
   - Some users may use a very long glob path to read and `isDirectory` may 
fail when the path is too long. We should ignore the error when the path is a 
glob path since the file streaming sink doesn’t support glob paths.
   - Checking whether a directory is outputted by File Streaming Sink may fail 
for various issues happening in the storage. We should add a flag to allow 
users to disable the checking logic and read the directory as a batch output.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   
   - The long glob path will not throw an exception when checking file sink 
format
   - Add a new flag `spark.sql.streaming.fileSink.formatCheck.enabled` to 
control the metadata checking logic.
   
   
   ### How was this patch tested?
   New UT added.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to