Demin, 1) KafkaIO does not depend on consumer group.id at all. So you can have multiple parallel pipelines reading from the same topic. Note that consumer.assign(partitions) does not need group id, only consumer.subscribe() does.
Sometimes users might want to set a group.id for various reasons. E.g. if you want to monitor consumption of a specific group.id externally. You can set a group.id if you want. 2) Beam applications store the consumption state internally in their checkpoint (sort of like Kafka storing offsets for a consumer group). That said, 'restart' support from different runners (Flink, Spark etc) might be bit different. Please specify your set up and we can confirm. Raghu. On Tue, Nov 15, 2016 at 4:19 AM, Demin Alexey <[email protected]> wrote: > Hi > > I have some question about kafka connector: > 1) in code i can see *consumer.assign(source.assignedPartitions);* as > result 2 client with same group name can't use inner load balancer from > kafka > > 2) other problem "how resume reading kafka after restart". In past we > depend on inner kafka mechanism for store offset in kafka/zookeeper, but > with KafkaIO i didn't found how i can make this behavior. > > > Additional clarification about processing: > sometime we need 2 separate job with same group.id for HA, if one job was > killed, second job can start handle messages after rebalance by kafka. > > Could you help me how we can build our requirement with beam processing? > > Thans > Alexey Diomin > >
