Hi All,
In document of Flink 1.6, it says that "Before 0.9 Kafka did not
provide any mechanisms to guarantee at-least-once or exactly-once semantics”
I read the source code of FlinkKafkaConsumer08, and the comment says:
“Please note that Flink snapshots the offsets internally as part of its
distributed checkpoints. The offsets
* committed to Kafka / ZooKeeper are only to bring the outside view of
progress in sync with Flink's view
* of the progress. That way, monitoring and other jobs can get a view of how
far the Flink Kafka consumer
* has consumed a topic"
Obviously, the kafka partition offsets are checkpointed periodically.
And when some error happens, the data are read from kafka, continued from the
checkpointed offset. Then source and other operator states restart from the
same checkpoint. Then why does the document say “Before 0.9 Kafka did not
provide any mechanisms to guarantee at-least-once or exactly-once semantics” ?
Thanks a lot.
Best
Henry