mikedias opened a new pull request #23782: [SPARK-26875][SQL] Add an option on FileStreamSource to include modified files URL: https://github.com/apache/spark/pull/23782 ## What changes were proposed in this pull request? The current behavior only the check the filename to determine if a file should be processed or not. I propose to add an option to also test the file timestamp if is greater than last time it was processed, as an indication that it's modified and have different content. It is useful when the source producer eventually overrides files with new content. ## How was this patch tested? Added unit tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
