HeartSaVioR commented on issue #23782: [SPARK-26875][SQL] Add an option on FileStreamSource to include modified files URL: https://github.com/apache/spark/pull/23782#issuecomment-463456066 First of all, I agree this would be one of valid use cases. I'm just thinking out loud about edge-case (maybe that's why Spark restricts): when timestamp of file is modified in any chance (contents being added, some unintended modification, etc.), all of contents in file are reprocessed (as UT in this patch leverages it) which is not only breaking `end-to-end exactly-once` but also breaking `stateful exactly-once` because state will not be rolled back. So the option would fall into "at-least-once" semantic for such case which end users would expect at least stateful exactly-once. It needs to be warned.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
