HeartSaVioR commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-653966315
I still feel the date option can be ranged via two symmetric options; while this won't help on SS case, this can help filtering out files in the range of date which are out of interest - while this patch only filters on the left side, we can do that for the right side as well. There was similar review comment from other contributor. For `filesModifiedBefore` vs `fromModifiedDate`, I think former sounds more natural and even `modifiedBefore` sounds enough (as we know it deals with files), but let's hear others' opinions as well. Before looking at the code diff, some points on PR description (note that PR title and description become commit message): > It's also compatible with structured streaming requiring just a handful of small additions to move forward there. Looking to complete that in a separate PR. The part is better to be just left to PR comment rather than the part of the commit message. Let's drop it from there. > This PR introduces an option that can be used with Spark file data sources similar to the latestFirst option in structured streaming. latestFirst option doesn't work like this, as you may already know. (If latestFirst works like this I won't have a headache.) It'd be better to explain the new option there. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
