Re: Kafka performance when it comes to throughput

2022-01-08 Thread Israel Ekpo
Hi Marisa, I am going to be running it in a Kubernetes cluster on Azure Kubernetes Service using the set up scripts available in my github repo https://github.com/izzyacademy/kafka-in-a-box https://youtu.be/TDw3tDAiBBM I will review the recommendations from the EventSizer.io tool as well to make

Re: Kafka performance when it comes to throughput

2022-01-08 Thread Marisa Queen
Hi Israel, Great job! It looks great and promising. I really like your YouTube channel and the way you present the material. A couple of things that you might want to consider for your benchmarks experiments: 1) What machine are you going to use? Is it a fast machine with enough cpu cores? I woul

Re: Kafka performance when it comes to throughput

2022-01-07 Thread Israel Ekpo
Marisa, I have kicked off the video series on performance optimization for the Kafka setup. I will be working on the various configurations for latency, throughput, availability and durability. https://youtu.be/aPlbG349cXg The first ones will be on latency and throughput which is what you are i

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Alex, > Furthermore, setting up a localhost pub/sub demo on a single machine (your laptop?) is so far removed from a real-world scenario I can't imagine how any numbers derived from that would be useful. I can't imagine either. That's why I'm planning to run this on a lab Linux machine with 8

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Wow, that's awesome! I wasn't expecting that. I truly appreciate your help and professionalism. > Let me find some time soon and I will do a video on that scenario optimized primarily for low latency and throughput. I will also compare how this performs when adjusted for durability and high availa

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Alex Craig
Marisa, you might consider engaging someone at Confluent, maybe they can give you some case studies or whitepapers from similar use-cases in the financial industry. (and yes, Kafka is used in the financial industry) . A client asking you to "prove that Kafka performs/scales" seems like an unusual

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Israel Ekpo
Thanks for your response Marisa. This has been a very interesting discussion and I appreciate it. It is a bit of a challenge in the sense that I wish I had a demo ready to go with similar use case and expectations to easily explain what I have been trying to convey I am always ready for a chall

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Israel, > You can achieve any performance benchmark you are willing to pay for. Thanks for your email. Allow me to respectfully disagree. I believe that some systems are better than others when it comes to performance. The idea that I can just take a slow system, multiply by 1 million, and the

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Israel Ekpo
Marisa, I do not agree with your assessment. There are several factors that could influence your performance numbers even with localhost. Your project should be configured based on your own needs. Your throughput could go up or lower depending on how you are configured based on what is important

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Joris, Thank you so much, friend! > I appreciate that setting up everything on localhost will be easier and lead to big numbers, but bear in mind that it's typically all the other real-life stuff (remote connections, replication, at-least once, ...) that causes massive slowdowns compared to lo

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Joris Peeters
These tutorials - though quite a bit outdated - seem quite useful: http://cloudurable.com/blog/kafka-tutorial-kafka-producer/index.html (and the follow-ups). Ends up being close to how I write this in Java, and tutorial 13 talks about batching and acks etc, which you'll need in order to tune to max

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Joris, Thank you so much. I plan to write a Java Consumer and a Java Producer, for my benchmark. Do you recommend an example that I can use as a reference to write my basic Java producer and simple Java consumer? I'll for sure share the through number I get with the community. Maybe even write

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Joris Peeters
I'd just follow the instructions in https://kafka.apache.org/quickstart to set up Kafka and Zookeeper on a single node, by running the Java processes directly. Or can run in Docker. For the producer and consumer I'd personally use Python, as it's the easiest to get going. You may want to look at h

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Joris, I've spoken to him. His answers are below: On Thu, Jan 6, 2022 at 1:37 PM Joris Peeters wrote: > There's a few unknown parameters here that might influence the answer, > though. From the top of my head, at least > - How much replication of the data is needed (for high availability),

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Okada, Thanks for your reply. Finally I see some numbers! I love numbers :) I've shown your email to my boss (I hope he will hire me to do this project) and he said the following: "I would like to see this 833k/sec number for myself. Am I asking too much? :) Can you set up a very basic and si

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Joris Peeters
There's a few unknown parameters here that might influence the answer, though. From the top of my head, at least - How much replication of the data is needed (for high availability), and how many acks for the producer? (If fire-and-forget it can be faster, if need to replicate and ack from 3 broker

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Hi Israel, Your email is great, but I'm afraid to forward it to my customer because it doesn't answer his question. I'm hoping that other members from this list will be able to give me a more NUMERIC answer, let's wait to see. Just to give you some follow up on your answer, when you say: > 30 p

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Haruki Okada
Hi, Marisa. Kafka is well-designed to make full use of system resources, so I think calculating based on machine's spec is a good start. Let's say we have servers with 10Gbps full-duplex NIC. Also, let's say we set the topic's replication factor to 3 (so the cluster will have minimum 3 servers),

Re: Kafka performance when it comes to throughput

2022-01-06 Thread Israel Ekpo
Hi Marisa I think there may be some confusion about the throughput for each partition and I want to explain briefly using some analogies Using transportation for example if we were to pick an airline or ridesharing organization to describe the volume of customers they can support per day we would

Kafka performance when it comes to throughput

2022-01-06 Thread Marisa Queen
Cheers from NYC! I'm trying to give a performance number to a potential client (from the financial market) who asked me the following question: *"If I have a Kafka system setup in the best way possible for performance, what is an approximate number that I can have in mind for the throughput of th