HeartSaVioR commented on a change in pull request #23747: [SPARK-26848][SQL] 
Introduce new option to Kafka source: offset by timestamp (starting/ending)
URL: https://github.com/apache/spark/pull/23747#discussion_r256219279
 
 

 ##########
 File path: docs/structured-streaming-kafka-integration.md
 ##########
 @@ -310,6 +310,23 @@ The following configurations are optional:
 
 <table class="table">
 <tr><th>Option</th><th>value</th><th>default</th><th>query 
type</th><th>meaning</th></tr>
+<tr>
 
 Review comment:
   For 1), given that this is a Spark document I would say it's better to not 
too concerned it and explain Kafka stuff deeply. Kafka should have specific 
documentation regarding this (even KIP) and end users should rely on that to 
understand how it works. 
   
   We could refer the doc if we find any (I'll try to find one hopefully not 
KIP), but even there's no doc, we could mention this feature leverage 
`KafkaConsumer.offsetsForTimes ` and refer to javadoc on Kafka API.
   
   For 2), I agree it's ideal to mention timestamp offset here is related to 
when message is ingested, but I also feel it may be a bad signal that they are 
not understanding either Kafka's `offsetsForTimes` or Spark's processing time 
vs event time if end users don't know this before mentioned it. Given both have 
docs for each so might be redundant if we mention this here again. But that's 
just my 2 cents and if we can explain it as one-liner or two it may not be too 
redundant.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to