From my understanding: - ENABLE_AUTO_COMMIT_CONFIG will say to Kafka consumer (used inside KafkaIO to read messages) to commit periodically offsets in the background; - on the other hand, if "commitOffsetsInFinalize()” is used, then Beam Checkpoint mechanism will be leveraged to restart from checkpoints in case of failures. It won’t need to wait for pipeline's finish, though it’s up to the runner to decide when and how often to save checkpoints.
In KafkaIO, it’s possible to use only one option for the same transform - either ENABLE_AUTO_COMMIT_CONFIG or commitOffsetsInFinalize() > On 6 Sep 2020, at 07:24, Apple <[email protected]> wrote: > > Hi everyone, > > I have a question on KafkaIO. > What is the difference between setting AUTO_COMMIT_CONFIG and > commitOffsetsInFinalize()? My understanding is that: > > 1. AUTO_COMMIT_CONFIG commits Kafka records as soon as > KafkaIO.read() outputs messages, but I am not sure how would this be helpful, > for e.g. if a consumer transform after KafkaIO.read() fails , the messages > would be lost (which sounds like at-most once semantics) > > 2. commitOffsetsFinalize() commits when the pipeline is finished. > But when does the pipeline end? In other words, when is PipelineResult.State > = Done in a streaming scenario? > > Thanks!
