I see. HA in Beam, as in generic distributed processing platforms like Hadoop, is typically provided within a job, users shouldn't have to run two jobs.
A simple Kafka consumer app, Kafka Streams app, or a generic distributed streaming app (Spark, Flink, etc) all provide different abstractions. All of these have different trade offs. E.g. A Beam app running on Dataflow would claim to provide this level of HA without requiring you to run multiple instances of it. It handles all the machine failures, upscaling or downscaling as the load changes etc. Raghu. On Tue, Nov 15, 2016 at 2:46 PM, Demin Alexey <[email protected]> wrote: > I have kafka topic with 10 patritions (for example), 2 jobs with same > group.id, kafka balanced by 5 partion in every reader. > > If one job was killed, kafka rebalance all 10 patritions to second reader > and processing still working without downtime. After restart first job > every reader will return to read by 5 partitions each. > > Very simple but stable way get HA in product. > > p.s. kafka rebalance/check heartbeat/manage connections inner without > client application magic. > > 2016-11-16 1:56 GMT+04:00 Raghu Angadi <[email protected]>: > >> >> On Tue, Nov 15, 2016 at 1:50 PM, Demin Alexey <[email protected]> wrote: >> >>> 2 separate job with same code and same group.id, if one job was killed, >>> second job can start handle messages after rebalance by kafka. >>> >> >> Do you want to use the same group.id for both? So you don't want the the >> second job to consume until the first one fails? >> >> Can you explain how this works generally (outside Beam context). If you >> have two running processes use the same group.id, I would think both of >> them read part of the stream, right? I mean, what is stopping the second >> job until the first one is killed? >> >> Raghu. >> > >
