sunchao commented on a change in pull request #31089:
URL: https://github.com/apache/spark/pull/31089#discussion_r553778269
##########
File path: docs/structured-streaming-kafka-integration.md
##########
@@ -878,7 +878,14 @@ group id, however, please read warnings for this option
and use it with caution.
where to start instead. Structured Streaming manages which offsets are
consumed internally, rather
than rely on the kafka Consumer to do it. This will ensure that no data is
missed when new
topics/partitions are dynamically subscribed. Note that `startingOffsets`
only applies when a new
- streaming query is started, and that resuming will always pick up from where
the query left off.
+ streaming query is started, and that resuming will always pick up from where
the query left off. Note
+ that when the offsets consumed by a streaming application is not in Kafka
(e.g., topics are deleted,
+ offsets are out of range, or offsets are removed after offset retention
period), the offsets
Review comment:
"offset retention period" : not sure if the offset is redundant.
Also, perhaps "the offsets are not reset" -> "they will not be reset".
##########
File path: docs/structured-streaming-kafka-integration.md
##########
@@ -878,7 +878,14 @@ group id, however, please read warnings for this option
and use it with caution.
where to start instead. Structured Streaming manages which offsets are
consumed internally, rather
than rely on the kafka Consumer to do it. This will ensure that no data is
missed when new
topics/partitions are dynamically subscribed. Note that `startingOffsets`
only applies when a new
- streaming query is started, and that resuming will always pick up from where
the query left off.
+ streaming query is started, and that resuming will always pick up from where
the query left off. Note
+ that when the offsets consumed by a streaming application is not in Kafka
(e.g., topics are deleted,
+ offsets are out of range, or offsets are removed after offset retention
period), the offsets
+ are not reset and the streaming application will see data lost. In extreme
cases, for example the
Review comment:
"see data lost" -> "see data loss"
##########
File path: docs/structured-streaming-kafka-integration.md
##########
@@ -878,7 +878,14 @@ group id, however, please read warnings for this option
and use it with caution.
where to start instead. Structured Streaming manages which offsets are
consumed internally, rather
than rely on the kafka Consumer to do it. This will ensure that no data is
missed when new
topics/partitions are dynamically subscribed. Note that `startingOffsets`
only applies when a new
- streaming query is started, and that resuming will always pick up from where
the query left off.
+ streaming query is started, and that resuming will always pick up from where
the query left off. Note
+ that when the offsets consumed by a streaming application is not in Kafka
(e.g., topics are deleted,
Review comment:
"is not in" -> "are not in"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]