Ali,

I don't know of proper benchmarks out there, but I've done some work in
this area, when trying to determine what hardware to get for particular use
cases.  My answers are in-line:

On Mon, Apr 10, 2017 at 7:05 PM, Ali Nazemian <alinazem...@gmail.com> wrote:

> Hi all,
>
> I was wondering if there is any benchmark or any recommendation for having
> physical HW vs virtual for the Kafka Brokers. I am trying to calculate the
> HW requirements for a Kafka Cluster with a hard SLA. My questions are as
> follows.
>
> - What is the effect of OS disk caching for a Kafka-Broker? How much
> on-heap and off-heap memory would be required per node?
>

If by "OS disk caching" you mean page cache, then it's huge.  Kafka relies
on it to serve as much data as possible from memory directly.  The actual
on-heap vs off-heap RAM requirements will be completely dependent on your
scenario.  I've run Kafka brokers with heaps as little as 8GB of RAM, and
as much as 32GB of RAM each, using CMS GC with some customized tuning.  The
boxes usually have much more physical RAM than that (64GB of RAM and 128GB
respectively), for page cache.

I think the Confluent recommendation these days is to use G1 GC, but I've
got no experience using that.


>
> - Since Kafka read-write workload is pretty sequential which of the
> following spinning disks would be recommended? SATA 7.2k, SAS 10k, SAS 15k?
>

Again, that will depend on your use case.  Are your consumers mostly
consuming up-to-the-second messages, or are they always connecting and
consuming from arbitrary offsets, or even from the beginning of topics?  If
your consumers are always consuming latest, then actual disk IO will be
extremely low, since almost everything will be served out of page
cache/memory, and most of the activity will be fsync's every ~5 seconds or
so.  On the other hand, if most of your consumption will be of old data,
then the brokers will need to read a large amount of data from disk
sequentially whenever consumers request data.  That's when you'd benefit
from something like SAS 10k or SAS 15k.  You'd need to run some custom
benchmarks to figure out what would work on workload that mimics your use
case.


>
> - Since Kafka is not CPU-intensive, how bad would be to coexist
> Kafka-Broker and a CPU-intensive workload like STORM?
>

I wouldn't recommend that at all.  I've always run Kafka on bare metal with
nothing else running on that box.  Otherwise, you won't be able to identify
bottlenecks when you run into them, and both systems would be impacting
each other all the time, potentially.  Hardware is cheap, not worth it to
spend time chasing issues caused by "noisy neighbors" on the boxes.


>
> Regards,
> Ali
>

Marcos

Reply via email to