HeartSaVioR commented on a change in pull request #32609: URL: https://github.com/apache/spark/pull/32609#discussion_r638336763
########## File path: docs/structured-streaming-kafka-integration.md ########## @@ -512,6 +526,17 @@ The following configurations are optional: </tr> </table> +### Details on timestamp offset options + +The returned offset for each partition is the earliest offset whose timestamp is greater than or equal to the given timestamp in the corresponding partition. +The behavior varies across options if the matched offset doesn't exist - check the description of each option. + +Spark simply passes the timestamp information to <code>KafkaConsumer.offsetsForTimes</code>, and doesn't interpret or reason about the value. +For more details on <code>KafkaConsumer.offsetsForTimes</code>, please refer <a href="https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes-java.util.Map-">javadoc</a> for details. Review comment: If I understand correctly, there's no notion of "latest" so we picked the version we used at that time. (Worth noting that the content was added when we added timestamp offset.) I'm OK to either raising the version to the one we use in 3.2 or lowering the version to minimum. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
