Ali, I don't know of proper benchmarks out there, but I've done some work in this area, when trying to determine what hardware to get for particular use cases. My answers are in-line:
On Mon, Apr 10, 2017 at 7:05 PM, Ali Nazemian <alinazem...@gmail.com> wrote: > Hi all, > > I was wondering if there is any benchmark or any recommendation for having > physical HW vs virtual for the Kafka Brokers. I am trying to calculate the > HW requirements for a Kafka Cluster with a hard SLA. My questions are as > follows. > > - What is the effect of OS disk caching for a Kafka-Broker? How much > on-heap and off-heap memory would be required per node? > If by "OS disk caching" you mean page cache, then it's huge. Kafka relies on it to serve as much data as possible from memory directly. The actual on-heap vs off-heap RAM requirements will be completely dependent on your scenario. I've run Kafka brokers with heaps as little as 8GB of RAM, and as much as 32GB of RAM each, using CMS GC with some customized tuning. The boxes usually have much more physical RAM than that (64GB of RAM and 128GB respectively), for page cache. I think the Confluent recommendation these days is to use G1 GC, but I've got no experience using that. > > - Since Kafka read-write workload is pretty sequential which of the > following spinning disks would be recommended? SATA 7.2k, SAS 10k, SAS 15k? > Again, that will depend on your use case. Are your consumers mostly consuming up-to-the-second messages, or are they always connecting and consuming from arbitrary offsets, or even from the beginning of topics? If your consumers are always consuming latest, then actual disk IO will be extremely low, since almost everything will be served out of page cache/memory, and most of the activity will be fsync's every ~5 seconds or so. On the other hand, if most of your consumption will be of old data, then the brokers will need to read a large amount of data from disk sequentially whenever consumers request data. That's when you'd benefit from something like SAS 10k or SAS 15k. You'd need to run some custom benchmarks to figure out what would work on workload that mimics your use case. > > - Since Kafka is not CPU-intensive, how bad would be to coexist > Kafka-Broker and a CPU-intensive workload like STORM? > I wouldn't recommend that at all. I've always run Kafka on bare metal with nothing else running on that box. Otherwise, you won't be able to identify bottlenecks when you run into them, and both systems would be impacting each other all the time, potentially. Hardware is cheap, not worth it to spend time chasing issues caused by "noisy neighbors" on the boxes. > > Regards, > Ali > Marcos