Re: KafkaIO for HA jobs

Demin Alexey Tue, 15 Nov 2016 13:50:46 -0800

Hi, Raghu

Thanks for clarification.


But how I can implement solution for HA?
2 separate job with same code and same group.id, if one job was killed,
second job can start handle messages after rebalance by kafka.

Thanks
Alexey Diomin




2016-11-15 21:31 GMT+04:00 Raghu Angadi <[email protected]>:

> Demin,
>
> 1) KafkaIO does not depend on consumer group.id at all. So you can have
> multiple parallel pipelines reading from the same topic. Note
> that consumer.assign(partitions) does not need group id, only
> consumer.subscribe() does.
>
> Sometimes users might want to set a group.id for various reasons. E.g. if
> you want to monitor consumption of a specific group.id externally. You
> can set a group.id if you want.
>
> 2) Beam applications store the consumption state internally in their
> checkpoint (sort of like Kafka storing offsets for a consumer group). That
> said, 'restart' support from different runners (Flink, Spark etc) might be
> bit different. Please specify your set up and we can confirm.
>
> Raghu.
>
>
>
> On Tue, Nov 15, 2016 at 4:19 AM, Demin Alexey <[email protected]> wrote:
>
>> Hi
>>
>> I have some question about kafka connector:
>> 1) in code i can see *consumer.assign(source.assignedPartitions);* as
>> result 2 client with same group name can't use inner load balancer from
>> kafka
>>
>> 2) other problem "how resume reading kafka after restart". In past we
>> depend on inner kafka mechanism for store offset in kafka/zookeeper, but
>> with KafkaIO i didn't found how i can make this behavior.
>>
>>
>> Additional clarification about processing:
>> sometime we need 2 separate job with same group.id for HA, if one job
>> was killed, second job can start handle messages after rebalance by kafka.
>>
>> Could you help me how we can build our requirement with beam processing?
>>
>> Thans
>> Alexey Diomin
>>
>>
>

Re: KafkaIO for HA jobs

Reply via email to