Re: [Question] Using KafkaIO without a data loss

2022-09-27 Thread Yomal de Silva
But still, if we have a new deployment rolled out in which we can't recover the state from the previous snapshot/savepoint there is a possibility of a data loss here right? This is considering if we modify the existing operators or add/delete the operators in such a way that the operator states

Re: [Question] Using KafkaIO without a data loss

2022-09-25 Thread Reuven Lax via user
If you are using an exactly-once runner, it will guarantee every message is consumed once (though the mechanism might not be obvious). Generally what happens is that the messages are consumed into the system in order. However if you have downstream ParDos, there is no guarantee that they process

Re: [Question] Using KafkaIO without a data loss

2022-09-25 Thread Yomal de Silva
Hi Reuven, Thanks for those clarifications. For the 4th question that I raised, if A gets failed and B is committed, will those messages(A) get consumed again from Kafka or will the messages get recovered from the checkpoint and retried in that specific operator? On Sun, Sep 25, 2022 at 10:45 PM

Re: [Question] Using KafkaIO without a data loss

2022-09-25 Thread Reuven Lax via user
On Sun, Sep 25, 2022 at 4:56 AM Yomal de Silva wrote: > Hi all, > > I have started using KafkaIO to read a data stream and have the following > questions. Appreciate it if you could provide a few clarifications on the > following. > > 1. Does KafkaIO ignore the offset stored in the broker and

[Question] Using KafkaIO without a data loss

2022-09-25 Thread Yomal de Silva
Hi all, I have started using KafkaIO to read a data stream and have the following questions. Appreciate it if you could provide a few clarifications on the following. 1. Does KafkaIO ignore the offset stored in the broker and uses the offset stored during checkpointing when consuming messages?