[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-07-02 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-652860425 I appreciate your thoughtful feedback. I will open the issue as requested and remove the streaming references in this PR.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-07-02 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-652853468 > @cchighman Thanks for reading through the huge wall of text! > > I agree the option can be provided to batch query only, and consider how to apply the option to

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-07-02 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-652843996 @HeartSaVioR I still think implementing this at the _PartitioningAwareFileIndex_ level makes a lot of sense and bypasses all the complexities you mentioned above. There can

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-07-02 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-652807186 @HeartSaVioR Thank you for your detailed comments. I've been digging into the PR you mentioned along with the associated Kafka Batch sources, etc. I'm leaning towards

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-28 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650900831 > > I wonder though if structured streaming always implied an event source, particularly when streaming from a file source? > > Ideally it should be. It's not 100%

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-28 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650899177 @HeartSaVioR It's in effect no different than a path globular filter except that instead instead of my wildcard specifying a file extension, it's a wildcard on other

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-28 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650728661 @HeartSaVioR With_startingOffsetByTimestamp_, you have the ability to indicate start/end offsets per topic such as TopicA or TopicB. If this concept were applied to a

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-28 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650726701 > Please take a look at how Kafka data source options apply with both batch and streaming query. The semantic of the option should be applied differently. > >

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-27 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650518946 @HeartSaVioR The three files which had indentations without changes are now removed from this PR after corrections.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-26 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650478832 > The PR has lots of changed lines which are actually not changed (indentation). Indentation is the one of style guides, and they didn't seem to violate the guide (that said,