Also worth mentioning is that the new producer doesn't have this behavior
-- it will round robin over available partitions for records without keys.
"Available" means it currently has a leader -- under normal cases this
means it distributes evenly across all partitions, but if a partition is
down temporarily it will just avoid it. It's highly recommended you use the
new producer anyway since it comes with a lot of other improvements as well.

-Ewen

On Wed, Jul 15, 2015 at 4:57 PM, Lance Laursen <llaur...@rubiconproject.com>
wrote:

> From the FAQ:
>
> "To reduce # of open sockets, in 0.8.0 (
> https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
> key is not specified or null, a producer will pick a random partition and
> stick to it for some time (default is 10 mins) before switching to another
> one. So, if there are fewer producers than partitions, at a given point of
> time, some partitions may not receive any data. To alleviate this problem,
> one can either reduce the metadata refresh interval or specify a message
> key and a customized random partitioner. For more detail see this thread
>
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
> "
>
>
> On Wed, Jul 15, 2015 at 4:13 PM, Stefan Miklosovic <mikloso...@gmail.com>
> wrote:
>
> > Maybe there is some reason why produce sticks with a partition for
> > some period of time - mostly performance related. I can imagine that
> > constant switching between partitions can be kind of slow in such
> > sense that producer has to "refocus" on another partition to send a
> > message to and this switching may cost something so switching happens
> > sporadically.
> >
> > On the other hand, I would never expect such behaviour I encountered.
> > If it is once propagated as "random", I expect that it is really
> > random and not "random but .... not random every time". It is hard to
> > figure out these information, the only way seems to be to try all
> > other solutions and the most awkward one you would never expect to
> > work is actually the proper one ...
> >
> > On Thu, Jul 16, 2015 at 12:53 AM, JIEFU GONG <jg...@berkeley.edu> wrote:
> > > This is a total shot in the dark here so please ignore this if it fails
> > to
> > > make sense, but I remember that on some previous implementation of the
> > > producer prior to when round-robin was enabled, producers would send
> > > messages to only one of the partitions for a set period of time
> > > (configurable, I believe) before moving onto the next one. This caused
> > me a
> > > similar grievance as I would notice only a few of my consumers would
> get
> > > data while others were completely idle.
> > >
> > > Sounds similar, so check if that's a possibility at all?
> > >
> > > On Wed, Jul 15, 2015 at 3:04 PM, Jagbir Hooda <jho...@gmail.com>
> wrote:
> > >
> > >> Hi Stefan,
> > >>
> > >> Have you looked at the following output for message distribution
> > >> across the topic-partitions and which topic-partition is consumed by
> > >> which consumer thread?
> > >>
> > >> kafaka-server/bin>./kafka-run-class.sh
> > >> kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group
> > >> <consumer_group_name>
> > >>
> > >> Jagbir
> > >>
> > >> On Wed, Jul 15, 2015 at 12:50 PM, Stefan Miklosovic
> > >> <mikloso...@gmail.com> wrote:
> > >> > I have following problem, I tried almost everything I could but
> > without
> > >> any luck
> > >> >
> > >> > All I want to do is to have 1 producer, 1 topic, 10 partitions and
> 10
> > >> consumers.
> > >> >
> > >> > All I want is to send 1M of messages via producer to these 10
> > consumers.
> > >> >
> > >> > I am using built Kafka 0.8.3 from current upstream so I have
> bleeding
> > >> > edge stuff. It does not work on 0.8.1.1 nor 0.8.2 stream.
> > >> >
> > >> > The problem I have is that I expect that when I send 1 milion of
> > >> > messages via that producer, I will have all consumers busy. In other
> > >> > words, if a message to be sent via producer is sent to partition
> > >> > randomly (roundrobin / range), I expect that all 10 consumers will
> > >> > process about 100k of messages each because producer sends it to
> > >> > random partition of these 10.
> > >> >
> > >> > But I have never achieved such outcome.
> > >> >
> > >> > I was trying these combinations:
> > >> >
> > >> > 1) old scala producer vs old scala consumer
> > >> >
> > >> > Consumer was created by Consumers.createJavaConsumer() ten times.
> > >> > Every consumer is running in the separate thread.
> > >> >
> > >> > 2) old scala producer vs new java consumer
> > >> >
> > >> > new consumer was used like I have 10 consumers listening for a topic
> > >> > and 10 consumers subscribed to 1 partition. (consumer 1 - partition
> 1,
> > >> > consumer 2 - paritition 2 and so on)
> > >> >
> > >> > 3) old scala producer with custom partitioner
> > >> >
> > >> > I even tried to use my own partitioner, I just generated a random
> > >> > number from 0 to 9 so I expected that the messages will be sent
> > >> > randomly to the partition of that number.
> > >> >
> > >> > All I see is that there are only couple of consumers from these 10
> > >> > utilized, even I am sending 1M of messages, all I got from the
> > >> > debugging output is some preselected set of consumers which appear
> to
> > >> > be selected randomly.
> > >> >
> > >> > Do you have ANY hint why all consumers are not utilized even
> > >> > partitions are selected randomly?
> > >> >
> > >> > My initial suspicion was that rebalancing was done badly. The think
> > >> > was I was generating old consumers in a loop quicky one after
> another
> > >> > and I can imaging that rebalancing algorithm got mad.
> > >> >
> > >> > So I abandon this solution and I was thinking that let's just
> > >> > subscribe these consumers one by one to some partition so I will
> have
> > >> > 1 consumer subscribed just to 1 partition and there will not be any
> > >> > rebalancing at all.
> > >> >
> > >> > Oh my how wrong was I ... nothing changed.
> > >> >
> > >> > So I was thinking that if I have 10 consumers, each one subscribed
> to
> > >> > 1 paritition, maybe producer is just sending messages to some set of
> > >> > partitions and that's it. I  was not sure how this can be possible
> so
> > >> > to be super sure about the even spreading of message to partitions,
> I
> > >> > used custom partitioner class in old consumer so I will be sure that
> > >> > the partition the message will be sent to is super random.
> > >> >
> > >> > But that does not seems to work either.
> > >> >
> > >> > Please people, help me.
> > >> >
> > >> > --
> > >> > Stefan Miklosovic
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Jiefu Gong
> > > University of California, Berkeley | Class of 2017
> > > B.A Computer Science | College of Letters and Sciences
> > >
> > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
> >
> >
> >
> > --
> > Stefan Miklosovic
> >
>



-- 
Thanks,
Ewen

Reply via email to