HeartSaVioR commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-643705743
I can even tolerate the fact maxFileAge is originated from path's latest timestamp. If we don't believe the node's wall time (I suspect other logic works well in such case though) then yes it might be the source of the truth across nodes. I feel all the confusions come from the behavior of `latestFirst`. Yes we would like to read from latest in some case if we're only interested with latest files. But then should we really open the possibility to trace back older files? Would we just simply do the thing we do with Kafka's "latest" option, which only affects the first batch and no-op in further batches? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
