xuanyuanking opened a new pull request #31638: URL: https://github.com/apache/spark/pull/31638
### What changes were proposed in this pull request? - Add a flag `spark.sql.streaming.fileSink.formatCheck.enabled` to skip the format check for file streaming sink. This can be turned off when the user wants to read the directory as a batch output. - When checking a glob path throws an exception, we will assume the user wants to read a batch output. ### Why are the changes needed? - Some users may use a very long glob path to read and `isDirectory` may fail when the path is too long. We should ignore the error when the path is a glob path since the file streaming sink doesn’t support glob paths. - Checking whether a directory is outputted by File Streaming Sink may fail for various issues happening in the storage. We should add a flag to allow users to disable the checking logic and read the directory as a batch output. ### Does this PR introduce _any_ user-facing change? Yes. - The long glob path will not throw an exception when checking file sink format - Add a new flag `spark.sql.streaming.fileSink.formatCheck.enabled` to control the metadata checking logic. ### How was this patch tested? New UT added. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
