>From the FAQ: "To reduce # of open sockets, in 0.8.0 ( https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one. So, if there are fewer producers than partitions, at a given point of time, some partitions may not receive any data. To alleviate this problem, one can either reduce the metadata refresh interval or specify a message key and a customized random partitioner. For more detail see this thread http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E "
On Wed, Jul 15, 2015 at 4:13 PM, Stefan Miklosovic <mikloso...@gmail.com> wrote: > Maybe there is some reason why produce sticks with a partition for > some period of time - mostly performance related. I can imagine that > constant switching between partitions can be kind of slow in such > sense that producer has to "refocus" on another partition to send a > message to and this switching may cost something so switching happens > sporadically. > > On the other hand, I would never expect such behaviour I encountered. > If it is once propagated as "random", I expect that it is really > random and not "random but .... not random every time". It is hard to > figure out these information, the only way seems to be to try all > other solutions and the most awkward one you would never expect to > work is actually the proper one ... > > On Thu, Jul 16, 2015 at 12:53 AM, JIEFU GONG <jg...@berkeley.edu> wrote: > > This is a total shot in the dark here so please ignore this if it fails > to > > make sense, but I remember that on some previous implementation of the > > producer prior to when round-robin was enabled, producers would send > > messages to only one of the partitions for a set period of time > > (configurable, I believe) before moving onto the next one. This caused > me a > > similar grievance as I would notice only a few of my consumers would get > > data while others were completely idle. > > > > Sounds similar, so check if that's a possibility at all? > > > > On Wed, Jul 15, 2015 at 3:04 PM, Jagbir Hooda <jho...@gmail.com> wrote: > > > >> Hi Stefan, > >> > >> Have you looked at the following output for message distribution > >> across the topic-partitions and which topic-partition is consumed by > >> which consumer thread? > >> > >> kafaka-server/bin>./kafka-run-class.sh > >> kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group > >> <consumer_group_name> > >> > >> Jagbir > >> > >> On Wed, Jul 15, 2015 at 12:50 PM, Stefan Miklosovic > >> <mikloso...@gmail.com> wrote: > >> > I have following problem, I tried almost everything I could but > without > >> any luck > >> > > >> > All I want to do is to have 1 producer, 1 topic, 10 partitions and 10 > >> consumers. > >> > > >> > All I want is to send 1M of messages via producer to these 10 > consumers. > >> > > >> > I am using built Kafka 0.8.3 from current upstream so I have bleeding > >> > edge stuff. It does not work on 0.8.1.1 nor 0.8.2 stream. > >> > > >> > The problem I have is that I expect that when I send 1 milion of > >> > messages via that producer, I will have all consumers busy. In other > >> > words, if a message to be sent via producer is sent to partition > >> > randomly (roundrobin / range), I expect that all 10 consumers will > >> > process about 100k of messages each because producer sends it to > >> > random partition of these 10. > >> > > >> > But I have never achieved such outcome. > >> > > >> > I was trying these combinations: > >> > > >> > 1) old scala producer vs old scala consumer > >> > > >> > Consumer was created by Consumers.createJavaConsumer() ten times. > >> > Every consumer is running in the separate thread. > >> > > >> > 2) old scala producer vs new java consumer > >> > > >> > new consumer was used like I have 10 consumers listening for a topic > >> > and 10 consumers subscribed to 1 partition. (consumer 1 - partition 1, > >> > consumer 2 - paritition 2 and so on) > >> > > >> > 3) old scala producer with custom partitioner > >> > > >> > I even tried to use my own partitioner, I just generated a random > >> > number from 0 to 9 so I expected that the messages will be sent > >> > randomly to the partition of that number. > >> > > >> > All I see is that there are only couple of consumers from these 10 > >> > utilized, even I am sending 1M of messages, all I got from the > >> > debugging output is some preselected set of consumers which appear to > >> > be selected randomly. > >> > > >> > Do you have ANY hint why all consumers are not utilized even > >> > partitions are selected randomly? > >> > > >> > My initial suspicion was that rebalancing was done badly. The think > >> > was I was generating old consumers in a loop quicky one after another > >> > and I can imaging that rebalancing algorithm got mad. > >> > > >> > So I abandon this solution and I was thinking that let's just > >> > subscribe these consumers one by one to some partition so I will have > >> > 1 consumer subscribed just to 1 partition and there will not be any > >> > rebalancing at all. > >> > > >> > Oh my how wrong was I ... nothing changed. > >> > > >> > So I was thinking that if I have 10 consumers, each one subscribed to > >> > 1 paritition, maybe producer is just sending messages to some set of > >> > partitions and that's it. I was not sure how this can be possible so > >> > to be super sure about the even spreading of message to partitions, I > >> > used custom partitioner class in old consumer so I will be sure that > >> > the partition the message will be sent to is super random. > >> > > >> > But that does not seems to work either. > >> > > >> > Please people, help me. > >> > > >> > -- > >> > Stefan Miklosovic > >> > > > > > > > > -- > > > > Jiefu Gong > > University of California, Berkeley | Class of 2017 > > B.A Computer Science | College of Letters and Sciences > > > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427 > > > > -- > Stefan Miklosovic >