zsxwing commented on issue #23782: [SPARK-26875][SS] Add an option on FileStreamSource to include modified files URL: https://github.com/apache/spark/pull/23782#issuecomment-555210613 @mikedias Thanks for giving the scenario. Yep, I understand this could be helpful. However, I would like to understand more about the use case. How does a user upload `lastest_sales.csv` to the source folder? - Writing directly. This is not recommended as it can potentially break the streaming query if it sees a partial file. - Rename into the source folder. Then since a rename operator is involved, why not add a uuid to the file name so overwriting a file is totally not necessary. As I mentioned previously, overwriting a file makes everything complicated. The user has to think about when is the safe time to overwrite a file. Adding this option may make the user think Spark can handle file overwriting correctly, however we don't and cannot handle race conditions internally in Spark.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
