I'm trying to maximize my throughput and seem to have hit a ceiling. Everything 
described below is running in AWS.

I have configured a Kafka cluster with 5 machines, M1.Large, with 600 
provisioned IOPS storage for each EC2 instance. I have a Zookeeper server (we 
aren't in production yet, so I didn't take the time to setup a ZK cluster). 
Publishing to a single topic from 7 different clients, I seem to max out at 
around 20,000 eps with a fixed 2K message size. Each broker defines 10 file 
segments, with a 25000 message / 5 second flush configuration in 
server.properties. I have stuck with 8 threads. My producers (Java) are 
configured with batch.num.messages at 50, and queue.buffering.max.messages at 
100.

When I went from 4 servers in the cluster to 5 servers, I only saw an increase 
of about 500 events per second in throughput. In sharp contrast, when I run a 
complete environment on my MacBook Pro, tuned as described above but with a 
single ZK and a single Kafka broker, I am seeing 61,000 events per second. I 
don't think I'm network constrained in the AWS environment (producer side) 
because when I add one more client, my MacBook Pro, I see a proportionate 
decrease in EC2 client throughput, and the net result is an identical 20,000 
eps. Stated differently, my EC2 instance give up throughput when my local 
MacBook Pro joins the array of producers such that the throughput is exactly 
the same.

Does anyone have any additional suggestions on what else I could tune to try 
and hit our goal, 50,000 eps with a 5 machine cluster? Based on the whitepapers 
published, LinkedIn describes a peak of 170,000 events per second across their 
cluster. My 20,000 seems so far away from their production figures.

What is the relationship, in terms of performance, between ZK and Kafka? Do I 
need to have a more performant ZK cluster, the same, or does it really not 
matter in terms of maximizing throughput.

Thanks for any suggestions – I've been pulling knobs and turning levers on this 
for several days now.


Jason

This electronic message contains information which may be confidential or 
privileged. The information is intended for the use of the individual or entity 
named above. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents of this information is 
prohibited. If you have received this electronic transmission in error, please 
notify us by e-mail at (postmas...@rapid7.com) immediately.

Reply via email to