[GitHub] [spark] gaborgsomogyi commented on a change in pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

GitBox Mon, 24 May 2021 00:00:14 -0700


gaborgsomogyi commented on a change in pull request #32609:
URL: https://github.com/apache/spark/pull/32609#discussion_r637727449




##########
File path: docs/structured-streaming-kafka-integration.md
##########
@@ -512,6 +526,17 @@ The following configurations are optional:
 </tr>
 </table>
 
+### Details on timestamp offset options
+
+The returned offset for each partition is the earliest offset whose timestamp 
is greater than or equal to the given timestamp in the corresponding partition.
+The behavior varies across options if the matched offset doesn't exist - check 
the description of each option.
+
+Spark simply passes the timestamp information to 
<code>KafkaConsumer.offsetsForTimes</code>, and doesn't interpret or reason 
about the value.
+For more details on <code>KafkaConsumer.offsetsForTimes</code>, please refer 
<a 
href="https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes-java.util.Map-";>javadoc</a>
 for details.

Review comment:
       Putting exact version information into the doc needs time to time 
attention.
   `https://kafka.apache.org/21/...`
   If we assume Kafka is not breaking API then we can put `latest` instead of 
`21`. Though not sure Kafka has such link.
   BTW why `21`, we're on `<kafka.version>2.8.0</kafka.version>` and feature 
requires minimum `0.10.1.0`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

Reply via email to