mikedias opened a new pull request #23782: [SPARK-26875][SQL] Add an option on 
FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782
 
 
   ## What changes were proposed in this pull request?
   
   The current behavior only the check the filename to determine if a file 
should be processed or not. I propose to add an option to also test the file 
timestamp if is greater than last time it was processed, as an indication that 
it's modified and have different content. 
   
   It is useful when the source producer eventually overrides files with new 
content.
   
   ## How was this patch tested?
   
   Added unit tests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to