HeartSaVioR commented on issue #23782: [SPARK-26875][SQL] Add an option on 
FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782#issuecomment-463475727
 
 
   The patch looks simple and clear: it seems to be just a matter of policy - 
allow or disallow possibly non-safe behavior via option.
   
   Personally I'd a bit worried there might be some cases which last modified 
timestamp on file is modified unintentionally, then things got messed up. Even 
end users intend to enable this option, end users might complain when end users 
encounter reprocessing file as well as breaking semantic due to unintended 
reason. Thought about way to mitigate shortly - mostly regarding file offset - 
but new overwritten file could have same file length, as well as we also need 
to store file offset so doesn't seem to be good option as of now.
   
   Maybe I'd worried too much, so need to hear other voices as well. (I'm just 
a one of contributor anyway and decision will be taken from committers so it's 
just my 2 cents.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to