Re: partitions stealing & balancing consumer threads across servers

2014-10-31 Thread Joel Koshy
In your instance if you have four JVMs (i.e., consumer processes), six threads per consumer process and 12 partitions, then each thread would only get one partition but the first two processes will get all the partitions and the last two processes would be idle. We could tweak the assignment strate

Re: partitions stealing & balancing consumer threads across servers

2014-10-30 Thread Bhavesh Mistry
Hi Joel, Yes, I am on Kafka Trunk branch. In my scenario, if you have back-up threads does that impact the allocation. If I have 24 threads (6 thread for each JVM total of 4 JVMS) in above example , does partition allocation gets evenly distributed (3 on each JVM) ? is this supported use case ?

Re: partitions stealing & balancing consumer threads across servers

2014-10-30 Thread Joel Koshy
BTW, "roundrobin" was a recent addition so you would need to be on trunk to use that. The partition assignor will lay out all the available consumer threads; and all the available partitions in a deterministic order (based on a hashcode); it then uses a circular iterator over the consumers and the

Re: partitions stealing & balancing consumer threads across servers

2014-10-30 Thread Bhavesh Mistry
HI Joel, Correction to my previous question: What is expected behavior of *roundrobin *policy above scenario ? Thanks, Bhavesh On Thu, Oct 30, 2014 at 1:39 PM, Bhavesh Mistry wrote: > Hi Joel, > > I have similar issue. I have tried *partition.assignment.strategy=* > *"roundrobin"*, but ho

Re: partitions stealing & balancing consumer threads across servers

2014-10-30 Thread Bhavesh Mistry
Hi Joel, I have similar issue. I have tried *partition.assignment.strategy=* *"roundrobin"*, but how do you accept this accept to work ? We have a topic with 32 partitions and 4 JVM with 10 threads each ( 8 is backup if one of JVM goes down). The roundrobin does not select all the JVM only 3 J

Re: partitions stealing & balancing consumer threads across servers

2014-10-30 Thread Joel Koshy
> example: launching 4 processes on 4 different machines with 4 threads per > process on 12 partition topic will have each machine with 3 assigned > threads and one doing nothing. more over no matter what number of threads > each process will have , as long as it is bigger then 3, the end result >

Re: partitions stealing & balancing consumer threads across servers

2014-10-29 Thread Shlomi Hazan
Jun, Joel, The issue here is exactly which threads are left out, and which threads are assigned partitions. Maybe I am missing something but what I want is to balance consuming threads across machines/processes, regardless of the amount of threads the machine launches (side effect: this way if you

Re: partitions stealing & balancing consumer threads across servers

2014-10-29 Thread Joel Koshy
Shlomi, If you are on trunk, and your consumer subscriptions are identical then you can try a slightly different partition assignment strategy. Try setting partition.assignment.strategy="roundrobin" in your consumer config. Thanks, Joel On Wed, Oct 29, 2014 at 06:29:30PM -0700, Jun Rao wrote: >

Re: partitions stealing & balancing consumer threads across servers

2014-10-29 Thread Jun Rao
By consumer, I actually mean consumer threads (the thread # you used when creating consumer streams). So, if you have 4 consumers, each with 4 threads, 4 of the threads will not get any data with 12 partitions. It sounds like that's not what you get? What's the output of the ConsumerOffsetChecker

Re: partitions stealing & balancing consumer threads across servers

2014-10-28 Thread Shlomi Hazan
Jun, I hear you say "partitions are evenly distributed among all consumers in the same group", yet I did bump into a case where launching a process with X high level consumer API threads took over all partitions, sending existing consumers to be unemployed. According to the claim above, and if I

Re: partitions stealing & balancing consumer threads across servers

2014-10-27 Thread Jun Rao
You can take a look at the "consumer rebalancing algorithm" part in http://kafka.apache.org/documentation.html. Basically, partitions are evenly distributed among all consumers in the same group. If there are more consumers in a group than partitions, some consumers will never get any data. Thanks

partitions stealing & balancing consumer threads across servers

2014-10-27 Thread Shlomi Hazan
Hi All, Using Kafka's high consumer API I have bumped into a situation where launching a consumer process P1 with X consuming threads on a topic with X partition kicks out all other existing consumer threads that consumed prior to launching the process P. That is, consumer process P is stealing al