zsxwing commented on issue #23782: [SPARK-26875][SS] Add an option on 
FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782#issuecomment-555210613
 
 
   @mikedias Thanks for giving the scenario. Yep, I understand this could be 
helpful. However, I would like to understand more about the use case. How does 
a user upload `lastest_sales.csv` to the source folder?
   
   - Writing directly. This is not recommended as it can potentially break the 
streaming query if it sees a partial file.
   - Rename into the source folder. Then since a rename operator is involved, 
why not add a uuid to the file name so overwriting a file is totally not 
necessary.
   
   As I mentioned previously, overwriting a file makes everything complicated. 
The user has to think about when is the safe time to overwrite a file. Adding 
this option may make the user think Spark can handle file overwriting 
correctly, however we don't and cannot handle race conditions internally in 
Spark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to