HeartSaVioR commented on issue #23782: [SPARK-26875][SQL] Add an option on 
FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782#issuecomment-463456066
 
 
   First of all, I agree this would be one of valid use cases.
   
   I'm just thinking out loud about edge-case (maybe that's why Spark 
restricts): when timestamp of file is modified in any chance (contents being 
added, some unintended modification, etc.), all of contents in file are 
reprocessed (as UT in this patch leverages it) which is not only breaking 
`end-to-end exactly-once` but also breaking `stateful exactly-once` because 
state will not be rolled back. So the option would fall into "at-least-once" 
semantic for such case which end users would expect at least stateful 
exactly-once. It needs to be warned.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to