On Sun, Sep 25, 2022 at 4:56 AM Yomal de Silva <yomal.prav...@gmail.com>
wrote:

> Hi all,
>
> I have started using KafkaIO to read a data stream and have the following
> questions. Appreciate it if you could provide a few clarifications on the
> following.
>
>
1. Does KafkaIO ignore the offset stored in the broker and uses the offset
> stored during checkpointing when consuming messages?
>

Generally yes, as that's the only way to guarantee consistency (we can't
atomically commit to the runner and to Kafka). However when starting a new
pipeline, you should be able to start reading at the broker checkpoint.


> 2. How many threads will be used by the Kafka consumer?
>

This depends somewhat on the runner, but you can expect one thread per
partition.


> 3. If the consumer polls a set of messages A and then later B while A is
> still being processed, is there a possibility of set B finishing before A?
> Does parallelism control this?
>

yes. Beam doesn't currently have any notion of ordering. All messages are
independent and can be processed at different times (the source also
reserves the right to process different ranges of a single Kafka partition
on different threads, though it doesn't currently do this).


> 4. In the above scenario if B is committed back to the broker and somehow
> A failed, upon a restart is there any way we can consume A again without
> losing data?
>

Data should never be lost. If B is processed, then you can assume that the
A data is checkpointed inside the Beam runner and will be processed to.



>
> Thank you.
>
>
>

Reply via email to