[GitHub] [spark] viirya commented on pull request #32747: [SPARK-35611][SS] Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source

GitBox Fri, 04 Jun 2021 01:05:37 -0700


viirya commented on pull request #32747:
URL: https://github.com/apache/spark/pull/32747#issuecomment-853411545



   Let me know if I understand correctly or not. So it sounds like when start 
offset timestamp cannot be found on Kafka, Spark will turn to read latest 
offset instead. I think it makes sense there is an option for end users to 
avoid query failure in that case. Just wondering, is latest offset the best 
option? If the start offset timestamp is far before latest offset, does it 
still make sense to retrieve latest offset?
   
   For example, partition 1 is able to get record with timestamp 1, but 
partition 2 returns unmatched offset. With this option, we retrieve latest 
offset from partition 2 instead. But the latest offset could be with timestamp 
1000, though there are some records after timestamp 1 before timestamp 1000?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on pull request #32747: [SPARK-35611][SS] Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source

Reply via email to