Re: KafkaIO for HA jobs

Raghu Angadi Tue, 15 Nov 2016 09:33:04 -0800

Demin,

1) KafkaIO does not depend on consumer group.id at all. So you can have
multiple parallel pipelines reading from the same topic. Note
that consumer.assign(partitions) does not need group id, only
consumer.subscribe() does.

Sometimes users might want to set a group.id for various reasons. E.g. if
you want to monitor consumption of a specific group.id externally. You can
set a group.id if you want.

2) Beam applications store the consumption state internally in their
checkpoint (sort of like Kafka storing offsets for a consumer group). That
said, 'restart' support from different runners (Flink, Spark etc) might be
bit different. Please specify your set up and we can confirm.

Raghu.

On Tue, Nov 15, 2016 at 4:19 AM, Demin Alexey <[email protected]> wrote:

> Hi
>
> I have some question about kafka connector:
> 1) in code i can see *consumer.assign(source.assignedPartitions);* as
> result 2 client with same group name can't use inner load balancer from
> kafka
>
> 2) other problem "how resume reading kafka after restart". In past we
> depend on inner kafka mechanism for store offset in kafka/zookeeper, but
> with KafkaIO i didn't found how i can make this behavior.
>
>
> Additional clarification about processing:
> sometime we need 2 separate job with same group.id for HA, if one job was
> killed, second job can start handle messages after rebalance by kafka.
>
> Could you help me how we can build our requirement with beam processing?
>
> Thans
> Alexey Diomin
>
>

Re: KafkaIO for HA jobs

Reply via email to