Re: hitting the throughput limit on a cluster?

Todd Palino Tue, 21 Feb 2017 10:45:50 -0800

So I think the important thing to look at here is the IO wait on your
system. You’re hitting disk throughput issues, and that’s what you most
likely need to resolve. So just from what you’ve described, I think the
only thing that is going to get you more performance is more spindles (or
faster spindles). This is either more disks or more brokers, but at the end
of it you need to eliminate the disk IO bottleneck.


-Todd


On Tue, Feb 21, 2017 at 7:29 AM, Jon Yeargers <jon.yearg...@cedexis.com>
wrote:

> Running 3x 8core on google compute.
>
> Topic has 16 partitions (replication factor 2) and is consumed by 16 docker
> containers on individual hosts.
>
> System seems to max out at around 40000 messages / minute. Each message is
> ~12K - compressed (snappy) JSON.
>
> Recently moved from 12 to the above 16 partitions with no change in
> throughput.
>
> Also tried increased the consumption capacity on each container by 50%. No
> effect.
>
> Network is running at ~6Gb/sec (measured using iperf3). Broker load is
> ~1.5. IOWait % is 5-10 (via sar).
>
> What are my options for adding throughput?
>
> - more brokers?
> - avro/protobuf messaging?
> - more disks / broker? (1 / host presently)
> - jumbo frames?
>
> (transparent huge pages is disabled)
>
>
> Looking at this article (
> https://engineering.linkedin.com/kafka/benchmarking-apache-
> kafka-2-million-writes-second-three-cheap-machines)
> it would appear that for our message size we are at the max. This would
> argue that we need to shrink the message size - so perhaps switching to
> avro is the next step?
>



-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino

Re: hitting the throughput limit on a cluster?

Reply via email to