[GitHub] [spark] HeartSaVioR commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

GitBox Sat, 13 Jun 2020 18:43:06 -0700


HeartSaVioR commented on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-643705743



   I can even tolerate the fact maxFileAge is originated from path's latest 
timestamp. If we don't believe the node's wall time (I suspect other logic 
works well in such case though) then yes it might be the source of the truth 
across nodes.
   
   I feel all the confusions come from the behavior of `latestFirst`. Yes we 
would like to read from latest in some case if we're only interested with 
latest files. But then should we really open the possibility to trace back 
older files? Would we just simply do the thing we do with Kafka's "latest" 
option, which only affects the first batch and no-op in further batches?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

Reply via email to