Re: How to achieve distributed processing and high availability simultaneously in Kafka?

sumit jain Sun, 21 Jun 2015 06:35:10 -0700

The re-balancing that you speak of, doesn't happen, as already noted in the
question if one consumer is listening on all partitions, then if second
consumer starts, it will not receive any message at all, until the first
one fails.


Is there any reference link for this re-balancing behaviour?

On Wed, May 6, 2015 at 6:18 PM, Jason Rosenberg <j...@squareup.com> wrote:

> A consumer thread can consume multiple partitions.  This is not unusual, in
> practice.
>
> In the example you gave, if multiple high-level consumers are using the
> same group id, they will automatically rebalance the partition assignment
> between them as consumers dynamically join and leave the group.  So, in
> your example, if process 1 dies, then process 2 will assume ownership for
> all the n partitions (and if it has n/2 threads, each thread will own 2 of
> the partitions).
>
> In my experience though, its generally fine to have fewer threads than
> partitions.  It depends on the volume of data incoming to each partition of
> course, and how fast the consumer takes to process each message.
>
> Jason
>
> On Wed, May 6, 2015 at 1:57 AM, sumit jain <sumitjai...@gmail.com> wrote:
>
> > I have a topic consisting of n partitions. To have distributed
> processing I
> > create two processes running on different machines. They subscribe to the
> > topic with same groupd id and allocate n/2 threads, each of which
> processes
> > single stream(n/2 partitions per process).
> >
> > With this I will have achieved load distribution, but now if process 1
> > crashes, than process 2 cannot consume messages from partitions allocated
> > to process 1, as it listened only on n/2 streams at the start.
> >
> > Or else, if I configure for HA and start n threads/streams on both
> > processes, then when one node fails, all partitions will be processed by
> > other node. But here, we have compromised distribution, as all partitions
> > will be processed by a single node at a time.
> >
> > Is there a way to achieve both simultaneously and how?
> > Note: Already asked this on stackoverflow
> >
> >
> http://stackoverflow.com/questions/30060261/how-to-achieve-distributed-processing-and-high-availability-simultaneously-in-ka
> > .
> > --
> > Thanks & Regards,
> > Sumit Jain
> >
>



-- 
Thanks & Regards,
Sumit Jain

Re: How to achieve distributed processing and high availability simultaneously in Kafka?

Reply via email to