Re: Producer not distributing across all partitions

Guozhang Wang Fri, 13 Sep 2013 15:05:51 -0700

Hello Joe,

The reason we make the producers to produce to a fixed partition for each
metadata-refresh interval are the following:


https://issues.apache.org/jira/browse/KAFKA-1017

https://issues.apache.org/jira/browse/KAFKA-959

So in a word the randomness is still preserved but within one
metadata-refresh interval the assignment is fixed.

I agree that the document should be updated accordingly.

Guozhang


On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <crypt...@gmail.com> wrote:

> Isn't this a bug?
>
> I don't see why we would want users to have to code and generate random
> partition keys to randomly distributed the data to partitions, that is
> Kafka's job isn't it?
>
> Or if supplying a null value tell the user this is not supported (throw
> exception) in KeyedMessage like we do for topic and not treat null as a key
> to hash?
>
> My preference is to put those three lines back in and let key be null and
> give folks randomness unless its not a bug and there is a good reason for
> it?
>
> Is there something about
> https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> taken out? I haven't had a chance to look through
> it yet
>
> My thought is a new person coming in they would expect to see the
> partitions filling up in a round robin fashion as our docs says and unless
> we force them in the API to know they have to-do this or give them the
> ability for this to happen when passing nothing in
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <d...@gradientx.com> wrote:
>
> > I ran into this problem as well Prashant.  The default partition key was
> > recently changed:
> >
> >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> >
> > It no longer assigns a random partition to data with a null partition
> key.
> >  I had to change my code to generate random partition keys to get the
> > randomly distributed behavior the producer used to have.
> >
> >
> > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <amasin...@gmail.com>
> > wrote:
> >
> > > Thanks Neha
> > >
> > > I will try applying this property and circle back.
> > >
> > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> and I
> > > receive the following error
> > >
> > >        Error: Could not find or load main class
> > > kafka.perf.ProducerPerformance
> > >
> > > I am running against 0.8.0-beta1
> > >
> > > Seems like perf is a separate project in the workspace.
> > >
> > > Does sbt package-assembly bundle the perf jar as well?
> > >
> > > Neither producer-perf-test not consumer-test are working with this
> build
> > >
> > >
> > >
> > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
> neha.narkh...@gmail.com
> > > >wrote:
> > >
> > > > As Jun suggested, one reason could be that the
> > > > topic.metadata.refresh.interval.ms is too high. Did you observe if
> the
> > > > distribution improves after topic.metadata.refresh.interval.ms has
> > > passed
> > > > ?
> > > >
> > > > Thanks
> > > > Neha
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <amasin...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using kafka 08 version ...
> > > > >
> > > > >
> > > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <jun...@gmail.com> wrote:
> > > > >
> > > > > > Which revision of 0.8 are you using? In a recent change, a
> producer
> > > > will
> > > > > > stick to a partition for topic.metadata.refresh.interval.ms
> > (defaults
> > > > to
> > > > > > 10
> > > > > > mins) time before picking another partition at random.
> > > > > > Thanks,
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> > amasin...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I created a topic with 4 partitions and for some reason the
> > > producer
> > > > is
> > > > > > > pushing only to one partition.
> > > > > > >
> > > > > > > This is consistently happening across all topics that I created
> > ...
> > > > > > >
> > > > > > > Is there a specific configuration that I need to apply to
> ensure
> > > that
> > > > > > load
> > > > > > > is evenly distributed across all partitions?
> > > > > > >
> > > > > > >
> > > > > > > Group           Topic                          Pid Offset
> > > > > >  logSize
> > > > > > >         Lag             Owner
> > > > > > > perfgroup1      perfpayload1                   0   10965
> > > > > 11220
> > > > > > >         255             perfgroup1_XXXX-0
> > > > > > > perfgroup1      perfpayload1                   1   0
> > > 0
> > > > > > >         0               perfgroup1_XXXX-1
> > > > > > > perfgroup1      perfpayload1                   2   0
> > > 0
> > > > > > >         0               perfgroup1_XXXXX-2
> > > > > > > perfgroup1      perfpayload1                   3   0
> > > 0
> > > > > > >         0               perfgroup1_XXXXX-3
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Re: Producer not distributing across all partitions

Reply via email to