Re: [PR] [SPARK-26875][SS] Add an option on FileStreamSource to include modified files [spark]

via GitHub Fri, 05 Apr 2024 01:59:34 -0700


HeartSaVioR commented on PR #23782:
URL: https://github.com/apache/spark/pull/23782#issuecomment-2039266260


   https://github.com/apache/spark/pull/23782#issuecomment-555210613
   
   This comment explains everything. 
   
   Also I do not agree that spark.sql.files.ignoreCorruptFiles is a rescue, 
likewise I commented above. If you ever require Spark to provide at-least-once 
fault tolerance, there should be never a change to the source on replay. If the 
input file is somehow overwritten between the batch failure and the 
reprocessing of the same batch, fault tolerance is going to be broken. It's a 
hard problem, not a trivial one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-26875][SS] Add an option on FileStreamSource to include modified files [spark]

Reply via email to