Thank you Erik.

In my test I am using fixed 200bytes messages and I run 500k messages per
producer on 92 physically isolated producers. Each test run takes about 20
minutes. As the broker cluster is migrated into a new physical cluster, I
will perform my test and get the latency results in the next couple of
weeks.

I will keep you posted.

Thanks.

On Wed, Sep 9, 2015 at 4:58 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> Yes, and that can really hurt average performance.  All the partitions
> were nearly identical up to the 99%’ile, and had very good performance at
> that level hovering around a few milli’s.  But when looking beyond the
> 99%’ile, there was that clear fork in the distribution where a set of 3
> partitions surged upwards.  This could be for a dozen different reasons:
> Network blips, noisy networks, location in the network, resource
> contention on that broker, etc.  But it effected that one broker more than
> others.  And the reasons for my cluster displaying this behavior could be
> very different than the reason for any other cluster.
>
> Its worth noting that this was mostly a latency test over a stress test.
> There was a single kafka producer object, very small message sizes (100
> bytes), and it was only pushing through around 5MB/s worth of data. And
> the client was configured to minimize the amount of data that would be on
> the internal queue/buffer waiting to be sent.  The messages that were
> being sent were compromised of 10 byte ascii ‘words’ selected randomly
> from a dictionary of 1000 words, which benefits compression while still
> resulting in likely unique messages.  And the test I ran was only for 6
> min, and I did not do the work required to see if there was a burst of
> slower messages which caused this behavior, or if it was a consistent
> issue with that node.
> -Erik
>
>
> On 9/9/15, 2:24 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >So are you suggesting that the long delays happened in %1 percentile
> >happens in the slower partitions that are further away? Thanks.
> >
> >On Wed, Sep 9, 2015 at 3:15 PM, Helleren, Erik
> ><erik.helle...@cmegroup.com>
> >wrote:
> >
> >> So, I did my own latency test on a cluster of 3 nodes, and there is a
> >> significant difference around the 99%’ile and higher for partitions when
> >> measuring the the ack time when configured for a single ack.  The graph
> >> that I wish I could attach or post clearly shows that around 1/3 of the
> >> partitions significantly diverge from the other two.  So, at least in my
> >> case, one of my brokers is further than the others.
> >> -Erik
> >>
> >> On 9/4/15, 1:06 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
> >>
> >> >No problem. Thanks for your advice. I think it would be fun to
> >>explore. I
> >> >only know how to program in java though. Hope it will work.
> >> >
> >> >On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik
> >> ><erik.helle...@cmegroup.com>
> >> >wrote:
> >> >
> >> >> I thing the suggestion is to have partitions/brokers >=1, so 32
> >>should
> >> >>be
> >> >> enough.
> >> >>
> >> >> As for latency tests, there isn’t a lot of code to do a latency test.
> >> >>If
> >> >> you just want to measure ack time its around 100 lines.  I will try
> >>to
> >> >> push out some good latency testing code to github, but my company is
> >> >> scared of open sourcing code… so it might be a while…
> >> >> -Erik
> >> >>
> >> >>
> >> >> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
> >> >>
> >> >> >Thanks for your reply Erik. I am running some more tests according
> >>to
> >> >>your
> >> >> >suggestions now and I will share with my results here. Is it
> >>necessary
> >> >>to
> >> >> >use a fixed number of partitions (32 partitions maybe) for my test?
> >> >> >
> >> >> >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are
> >> >>running
> >> >> >on individual physical nodes. So I think using at least 32
> >>partitions
> >> >>will
> >> >> >make more sense? I have seen latencies increase as the number of
> >> >> >partitions
> >> >> >goes up in my experiments.
> >> >> >
> >> >> >To get the latency of each event data recorded, are you suggesting
> >> >>that I
> >> >> >rewrite my own test program (in Java perhaps) or I can just modify
> >>the
> >> >> >standard test program provided by kafka (
> >> >> >https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I
> >>need
> >> >>to
> >> >> >rebuild the source if I modify the standard java test program
> >> >> >ProducerPerformance provided in kafka, right? Now this standard
> >>program
> >> >> >only has average latencies and percentile latencies but no per event
> >> >> >latencies.
> >> >> >
> >> >> >Thanks.
> >> >> >
> >> >> >On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik
> >> >> ><erik.helle...@cmegroup.com>
> >> >> >wrote:
> >> >> >
> >> >> >> That is an excellent question!  There are a bunch of ways to
> >>monitor
> >> >> >> jitter and see when that is happening.  Here are a few:
> >> >> >>
> >> >> >> - You could slice the histogram every few seconds, save it out
> >>with a
> >> >> >> timestamp, and then look at how they compare.  This would be
> >>mostly
> >> >> >> manual, or you can graph line charts of the percentiles over time
> >>in
> >> >> >>excel
> >> >> >> where each percentile would be a series.  If you are using HDR
> >> >> >>Histogram,
> >> >> >> you should look at how to use the Recorder class to do this
> >>coupled
> >> >> >>with a
> >> >> >> ScheduledExecutorService.
> >> >> >>
> >> >> >> - You can just save the starting timestamp of the event and the
> >> >>latency
> >> >> >>of
> >> >> >> each event.  If you put it into a CSV, you can just load it up
> >>into
> >> >> >>excel
> >> >> >> and graph as a XY chart.  That way you can see every point during
> >>the
> >> >> >> running of your program and you can see trends.  You want to be
> >> >>careful
> >> >> >> about this one, especially of writing to a file in the callback
> >>that
> >> >> >>kfaka
> >> >> >> provides.
> >> >> >>
> >> >> >> Also, I have noticed that most of the very slow observations are
> >>at
> >> >> >> startup.  But don’t trust me, trust the data and share your
> >>findings.
> >> >> >> Also, having a 99.9 percentile provides a pretty good standard for
> >> >> >>typical
> >> >> >> poor case performance.  Average is borderline useless, 50%’ile is
> >>a
> >> >> >>better
> >> >> >> typical case because that’s the number that says “half of events
> >> >>will be
> >> >> >> this slow or faster”, or for values that are high like 99.9%’ile,
> >> >>“0.1%
> >> >> >>of
> >> >> >> all events will be slower than this”.
> >> >> >> -Erik
> >> >> >>
> >> >> >> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com>
> wrote:
> >> >> >>
> >> >> >> >Thank you Erik! That's is helpful!
> >> >> >> >
> >> >> >> >But also I see jitters of the maximum latencies when running the
> >> >> >> >experiment.
> >> >> >> >
> >> >> >> >The average end to acknowledgement latency from producer to
> >>broker
> >> >>is
> >> >> >> >around 5ms when using 92 producers and 4 brokers, and the 99.9
> >> >> >>percentile
> >> >> >> >latency is 58ms, but the maximum latency goes up to 1359 ms. How
> >>to
> >> >> >>locate
> >> >> >> >the source of this jitter?
> >> >> >> >
> >> >> >> >Thanks.
> >> >> >> >
> >> >> >> >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik
> >> >> >> ><erik.helle...@cmegroup.com>
> >> >> >> >wrote:
> >> >> >> >
> >> >> >> >> WellŠ not to be contrarian, but latency depends much more on
> >>the
> >> >> >>latency
> >> >> >> >> between the producer and the broker that is the leader for the
> >> >> >>partition
> >> >> >> >> you are publishing to.  At least when your brokers are not
> >> >>saturated
> >> >> >> >>with
> >> >> >> >> messages, and acks are set to 1.  If acks are set to ALL,
> >>latency
> >> >>on
> >> >> >>an
> >> >> >> >> non-saturated kafka cluster will be: Round Trip Latency from
> >> >> >>producer to
> >> >> >> >> leader for partition + Max( slowest Round Trip Latency to a
> >> >>replicas
> >> >> >>of
> >> >> >> >> that partition).  If a cluster is saturated with messages, we
> >> >>have to
> >> >> >> >> assume that all partitions receive an equal distribution of
> >> >>messages
> >> >> >>to
> >> >> >> >> avoid linear algebra and queueing theory models.  I don¹t like
> >> >>linear
> >> >> >> >> algebra :P
> >> >> >> >>
> >> >> >> >> Since you are probably putting all your latencies into a single
> >> >> >> >>histogram
> >> >> >> >> per producer, or worse, just an average, this pattern would
> >>have
> >> >>been
> >> >> >> >> obscured.  Obligatory lecture about measuring latency by Gil
> >>Tene
> >> >> >> >> (https://www.youtube.com/watch?v=9MKY4KypBzg).  To verify this
> >> >> >> >>hypothesis,
> >> >> >> >> you should re-write the benchmark to plot the latencies for
> >>each
> >> >> >>write
> >> >> >> >>to
> >> >> >> >> a partition for each producer into a histogram. (HRD histogram
> >>is
> >> >> >>pretty
> >> >> >> >> good for that).  This would give you producers*partitions
> >> >>histograms,
> >> >> >> >> which might be unwieldy for that many producers. But wait,
> >>there
> >> >>is
> >> >> >> >>hope!
> >> >> >> >>
> >> >> >> >> To verify that this hypothesis holds, you just have to see that
> >> >>there
> >> >> >> >>is a
> >> >> >> >> significant difference between different partitions on a SINGLE
> >> >> >> >>producing
> >> >> >> >> client. So, pick one producing client at random and use the
> >>data
> >> >>from
> >> >> >> >> that. The easy way to do that is just plot all the partition
> >> >>latency
> >> >> >> >> histograms on top of each other in the same plot, that way you
> >> >>have a
> >> >> >> >> pretty plot to show people.  If you don¹t want to setup
> >>plotting,
> >> >>you
> >> >> >> >>can
> >> >> >> >> just compare the medians (50¹th percentile) of the partitions¹
> >> >> >> >>histograms.
> >> >> >> >>  If there is a lot of variance, your latency anomaly is
> >>explained
> >> >>by
> >> >> >> >> brokers 4-7 being slower than nodes 0-3!  If there isn¹t a lot
> >>of
> >> >> >> >>variance
> >> >> >> >> at 50%, look at higher percentiles.  And if higher percentiles
> >>for
> >> >> >>all
> >> >> >> >>the
> >> >> >> >> partitions look the same, this hypothesis is disproved.
> >> >> >> >>
> >> >> >> >> If you want to make a general statement about latency of
> >>writing
> >> >>to
> >> >> >> >>kafka,
> >> >> >> >> you can merge all the histograms into a single histogram and
> >>plot
> >> >> >>that.
> >> >> >> >>
> >> >> >> >> To Yuheng¹s credit, more brokers always results in more
> >> >>throughput.
> >> >> >>But
> >> >> >> >> throughput and latency are two different creatures.  Its worth
> >> >>noting
> >> >> >> >>that
> >> >> >> >> kafka is designed to be high throughput first and low latency
> >> >>second.
> >> >> >> >>And
> >> >> >> >> it does a really good job at both.
> >> >> >> >>
> >> >> >> >> Disclaimer: I might not like linear algebra, but I do like
> >> >> >>statistics.
> >> >> >> >> Let me know if there are topics that need more explanation
> >>above
> >> >>that
> >> >> >> >> aren¹t covered by Gil¹s lecture.
> >> >> >> >> -Erik
> >> >> >> >>
> >> >> >> >> On 9/4/15, 9:03 AM, "Yuheng Du" <yuheng.du.h...@gmail.com>
> >>wrote:
> >> >> >> >>
> >> >> >> >> >When I using 32 partitions, the 4 brokers latency becomes
> >>larger
> >> >> >>than
> >> >> >> >>the
> >> >> >> >> >8
> >> >> >> >> >brokers latency.
> >> >> >> >> >
> >> >> >> >> >So is it always true that using more brokers can give less
> >> >>latency
> >> >> >>when
> >> >> >> >> >the
> >> >> >> >> >number of partitions is at least the size of the brokers?
> >> >> >> >> >
> >> >> >> >> >Thanks.
> >> >> >> >> >
> >> >> >> >> >On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du
> >> >> >><yuheng.du.h...@gmail.com>
> >> >> >> >> >wrote:
> >> >> >> >> >
> >> >> >> >> >> I am running a producer latency test. When using 92
> >>producers
> >> >>in
> >> >> >>92
> >> >> >> >> >> physical node publishing to 4 brokers, the latency is
> >>slightly
> >> >> >>lower
> >> >> >> >> >>than
> >> >> >> >> >> using 8 brokers, I am using 8 partitions for the topic.
> >> >> >> >> >>
> >> >> >> >> >> I have rerun the test and it gives me the same result, the 4
> >> >> >>brokers
> >> >> >> >> >> scenario still has lower latency than the 8 brokers
> >>scenarios.
> >> >> >> >> >>
> >> >> >> >> >> It is weird because I tested 1broker, 2 brokers, 4 brokers,
> >>8
> >> >> >> >>brokers,
> >> >> >> >> >>16
> >> >> >> >> >> brokers and 32 brokers. For the rest of the case the latency
> >> >> >> >>decreases
> >> >> >> >> >>as
> >> >> >> >> >> the number of brokers increase.
> >> >> >> >> >>
> >> >> >> >> >> 4 brokers/8 brokers is the only pair that doesn't satisfy
> >>this
> >> >> >>rule.
> >> >> >> >> >>What
> >> >> >> >> >> could be the cause?
> >> >> >> >> >>
> >> >> >> >> >> I am using a 200 bytes message, the test let each producer
> >> >> >>publishes
> >> >> >> >> >>500k
> >> >> >> >> >> messages to a given topic. Every test run when I change the
> >> >> >>number of
> >> >> >> >> >> brokers, I use a new topic.
> >> >> >> >> >>
> >> >> >> >> >> Thanks for any advices.
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
>
>

Reply via email to