[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

GitBox Sun, 28 Jun 2020 03:08:11 -0700


cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-650728661



   @HeartSaVioR 
   With_startingOffsetByTimestamp_, you have the ability to indicate start/end 
offsets per topic such as TopicA or TopicB.  If this concept were applied to a 
file data source with the underlying intent that each file name represented a 
topic, problems begin to emerge.  For example, if there are multiple files, 
they would have different file names,  different file names may imply a new 
topic.
   
   This would mean a naming convention would have to be followed if you were 
reading from a file data source by _path_ since that path could have different 
file names...or topics...and you couldn't consider the whole as one stream.
   
   
   Thoughts?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

Reply via email to