Re: KafkaIO committing semantics

Alexey Romanenko Mon, 07 Sep 2020 10:54:48 -0700

From my understanding: 
- ENABLE_AUTO_COMMIT_CONFIG will say to Kafka consumer (used inside KafkaIO to 
read messages) to commit periodically offsets in the background;
- on the other hand, if "commitOffsetsInFinalize()” is used, then Beam 
Checkpoint mechanism will be leveraged to restart from checkpoints in case of 
failures. It won’t need to wait for pipeline's finish, though it’s up to the 
runner to decide when and how often to save checkpoints.


In KafkaIO, it’s possible to use only one option for the same transform - 
either ENABLE_AUTO_COMMIT_CONFIG or commitOffsetsInFinalize()


> On 6 Sep 2020, at 07:24, Apple <[email protected]> wrote:
> 
> Hi everyone, 
> 
> I have a question on KafkaIO.
> What is the difference between setting AUTO_COMMIT_CONFIG and 
> commitOffsetsInFinalize()? My understanding is that:
>  
> 1.            AUTO_COMMIT_CONFIG commits Kafka records as soon as 
> KafkaIO.read() outputs messages, but I am not sure how would this be helpful, 
> for e.g. if a consumer transform after KafkaIO.read() fails , the messages 
> would be lost (which sounds like at-most once semantics)
>  
> 2.            commitOffsetsFinalize()  commits when the pipeline is finished. 
> But when does the pipeline end? In other words, when is PipelineResult.State 
> = Done in a streaming scenario?
>  
> Thanks!

Re: KafkaIO committing semantics

Reply via email to