Re: kafka benchmark tests

Ewen Cheslack-Postava Tue, 14 Jul 2015 20:29:14 -0700

@Jiefu, yes! The patch is functional, I think it's just waiting on a bit of
final review after the last round of changes. You can definitely use it for
your own benchmarking, and we'd love to see patches for any additional
tests we missed in the first pass!


-Ewen

On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu> wrote:

> Yuheng,
> I would recommend looking here:
> http://kafka.apache.org/documentation.html#brokerconfigs and scrolling
> down
> to get a better understanding of the default settings and what they mean --
> it'll tell you what different options for acks does.
>
> Ewen,
> Thank you immensely for your thoughts, they shed a lot of insight into the
> issue. Though it is understandable that your specific results need to be
> verified, it seems that the KIP-25 patch is functional and I can use it for
> my own benchmarking purposes? Is that correct? Thanks again!
>
> On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <yuheng.du.h...@gmail.com>
> wrote:
>
> > Also, I guess setting the target throughput to -1 means let it be as high
> > as possible?
> >
> > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <yuheng.du.h...@gmail.com>
> > wrote:
> >
> > > Thanks. If I set the acks=1 in the producer config options in
> > > bin/kafka-run-class.sh
> org.apache.kafka.clients.tools.ProducerPerformance
> > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > batch.size=8196?
> > >
> > > Does that mean for each message generated at the producer, the producer
> > > will wait until the broker sends the ack back, then send another
> message?
> > >
> > > Thanks.
> > >
> > > Yuheng
> > >
> > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> ku...@nmsworks.co.in>
> > > wrote:
> > >
> > >> Yes, A list of  Kafka Server host/port pairs to use for establishing
> the
> > >> initial connection to the Kafka cluster
> > >>
> > >> https://kafka.apache.org/documentation.html#newproducerconfigs
> > >>
> > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <yuheng.du.h...@gmail.com>
> > >> wrote:
> > >>
> > >> > Does anyone know what is bootstrap.servers=
> > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following test
> > command:
> > >> >
> > >> > bin/kafka-run-class.sh
> > >> org.apache.kafka.clients.tools.ProducerPerformance
> > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > >> batch.size=8196?
> > >> >
> > >> > what is bootstrap.servers? Is it the kafka server that I am running
> a
> > >> test
> > >> > at?
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Yuheng
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > >> e...@confluent.io
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > I implemented (nearly) the same basic set of tests in the system
> > test
> > >> > > framework we started at Confluent and that is going to move into
> > >> Kafka --
> > >> > > see the wip patch for KIP-25 here:
> > >> > https://github.com/apache/kafka/pull/70
> > >> > > In particular, that test is implemented in benchmark_test.py:
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > >> > >
> > >> > > Hopefully once that's merged people can reuse that benchmark (and
> > add
> > >> to
> > >> > > it!) so they can easily run the same benchmarks across different
> > >> > hardware.
> > >> > > Here are some results from an older version of that test on
> > m3.2xlarge
> > >> > > instances on EC2 using local ephemeral storage (I think... it's
> been
> > >> > awhile
> > >> > > since I ran these numbers and I didn't document methodology that
> > >> > > carefully):
> > >> > >
> > >> > > INFO:_.KafkaBenchmark:=================
> > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > >> > > INFO:_.KafkaBenchmark:=================
> > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> 684097.470208
> > >> > > rec/sec (65.240000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Message size:
> > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> > (65.300000
> > >> > MB/s)
> > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
> > >> (56.830500
> > >> > > MB/s)
> > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
> > >> (267.830800
> > >> > > MB/s)
> > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000
> > >> MB/s)
> > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000
> > >> MB/s)
> > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
> > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > >> > >
> > >> > > Don't trust these numbers for anything, the were a quick one-off
> > test.
> > >> > I'm
> > >> > > just pasting the output so you get some idea of what the results
> > might
> > >> > look
> > >> > > like. Once we merge the KIP-25 patch, Confluent will be running
> the
> > >> tests
> > >> > > regularly and results will be available publicly so we'll be able
> to
> > >> keep
> > >> > > better tabs on performance, albeit for only a specific class of
> > >> hardware.
> > >> > >
> > >> > > For the batch.size question -- I'm not sure the results in the
> blog
> > >> post
> > >> > > actually have different settings, it could be accidental
> divergence
> > >> > between
> > >> > > the script and the blog post. The post specifically notes that
> > tuning
> > >> the
> > >> > > batch size in the synchronous case might help, but that he didn't
> do
> > >> > that.
> > >> > > If you're trying to benchmark the *optimal* throughput, tuning the
> > >> batch
> > >> > > size would make sense. Since synchronous replication will have
> > higher
> > >> > > latency and there's a limit to how many requests can be in flight
> at
> > >> > once,
> > >> > > you'll want a larger batch size to compensate for the additional
> > >> latency.
> > >> > > However, in practice the increase you see may be negligible.
> > Somebody
> > >> who
> > >> > > has spent more time fiddling with tweaking producer performance
> may
> > >> have
> > >> > > more insight.
> > >> > >
> > >> > > -Ewen
> > >> > >
> > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu>
> > >> wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > I was wondering if any of you guys have done benchmarks on Kafka
> > >> > > > performance before, and if they or their details (# nodes in
> > >> cluster, #
> > >> > > > records / size(s) of messages, etc.) could be shared.
> > >> > > >
> > >> > > > For comparison purposes, I am trying to benchmark Kafka against
> > some
> > >> > > > similar services such as Kinesis or Scribe. Additionally, I was
> > >> > wondering
> > >> > > > if anyone could shed some insight on Jay Kreps' benchmarks that
> he
> > >> has
> > >> > > > openly published here:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > >> > > >
> > >> > > > Specifically, I am unsure of why between his tests of 3x
> > synchronous
> > >> > > > replication and 3x async replication he changed the batch.size,
> as
> > >> well
> > >> > > as
> > >> > > > why he is seemingly publishing to incorrect topics:
> > >> > > >
> > >> > > > Configs:
> > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > >> > > >
> > >> > > > Any help is greatly appreciated!
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Jiefu Gong
> > >> > > > University of California, Berkeley | Class of 2017
> > >> > > > B.A Computer Science | College of Letters and Sciences
> > >> > > >
> > >> > > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Thanks,
> > >> > > Ewen
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>



-- 
Thanks,
Ewen

Re: kafka benchmark tests

Reply via email to