Re: Validating Kafka Disk Throughput Formulas (Write & Read)

Brebner, Paul via users Mon, 01 Dec 2025 20:29:03 -0800

Hi - the 3 consumer scenarios actually model (1) real-time consumers = reading 
from cache, not local disk (2) delayed consumers = reading from local disk and 
(3) remote consumers = reading from remote tiered storage.
So I think for your example delayed consumers (2) is what you want :-) - just 
set others to 0.

Paul

From: Prateek Kohli <[email protected]>
Date: Tuesday, 2 December 2025 at 3:20 pm
To: Brebner, Paul <[email protected]>, [email protected] 
<[email protected]>
Subject: RE: Validating Kafka Disk Throughput Formulas (Write & Read)

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

Hi @Brebner, Paul<mailto:[email protected]>,

Thanks for your reply.

I checked this calculator: 
https://github.com/instaclustr/code-samples/blob/main/Kafka/TieredStorage/kafka_calculator_graphs.html<https://urldefense.com/v3/__https://github.com/instaclustr/code-samples/blob/main/Kafka/TieredStorage/kafka_calculator_graphs.html__;!!Nhn8V6BzJA!WGKB6KLq_vA7cluKbj3hO3yjLJDG9J6-DLfstjjfRes1n0Wea53cCkQg7LL4FpLN6jTDqTB8P94dsFURupZq40pghbs$>

But I don’t think it considers scenarios with replication lag. For example, if 
my replicas are significantly behind the leader, then when they fetch data from 
the leader, the leader will need to read from disk to serve those followers. 
Shouldn’t we consider disk bandwidth for this scenario as well?

From: Brebner, Paul <[email protected]>
Sent: 02 December 2025 07:23
To: [email protected]
Cc: Prateek Kohli <[email protected]>
Subject: Re: Validating Kafka Disk Throughput Formulas (Write & Read)

You don't often get email from 
[email protected]<mailto:[email protected]>. Learn why this is 
important<https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!Nhn8V6BzJA!WGKB6KLq_vA7cluKbj3hO3yjLJDG9J6-DLfstjjfRes1n0Wea53cCkQg7LL4FpLN6jTDqTB8P94dsFURupZqubrZLbQ$>
Hi Prateek,

You may find this blog on wrote on Kafka sizing useful:

https://www.instaclustr.com/blog/how-to-size-apache-kafka-clusters-for-tiered-storage-part-1/

With the associated calculator here:  
https://github.com/instaclustr/code-samples/tree/main/Kafka/TieredStorage<https://urldefense.com/v3/__https://github.com/instaclustr/code-samples/tree/main/Kafka/TieredStorage__;!!Nhn8V6BzJA!WGKB6KLq_vA7cluKbj3hO3yjLJDG9J6-DLfstjjfRes1n0Wea53cCkQg7LL4FpLN6jTDqTB8P94dsFURupZqSX3h_kQ$>
This one in particular: 
https://github.com/instaclustr/code-samples/blob/main/Kafka/TieredStorage/kafka_calculator_graphs.html<https://urldefense.com/v3/__https://github.com/instaclustr/code-samples/blob/main/Kafka/TieredStorage/kafka_calculator_graphs.html__;!!Nhn8V6BzJA!WGKB6KLq_vA7cluKbj3hO3yjLJDG9J6-DLfstjjfRes1n0Wea53cCkQg7LL4FpLN6jTDqTB8P94dsFURupZq40pghbs$>
Just download and use locally with a browser. For your example you want 
“delayed consumers” only.

Regards, Paul Brebner
NetApp Instaclustr

From: Prateek Kohli via users 
<[email protected]<mailto:[email protected]>>
Date: Monday, 1 December 2025 at 8:49 pm
To: users <[email protected]<mailto:[email protected]>>
Cc: Prateek Kohli 
<[email protected]<mailto:[email protected]>>
Subject: Validating Kafka Disk Throughput Formulas (Write & Read)
EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

Hi everyone,

I’m working on capacity planning for Kafka and wanted to validate two formulas 
I’m using to estimate cluster-level disk throughput in a worst-case scenario 
(when all reads come from disk due to large consumer lag and replication lag).

1. Disk Write Throughput
Write_Throughput = Ingest_MBps × Replication_Factor(3)
Explanation:
Every MB of data written to Kafka is stored on all replicas (leader + 
followers), so total disk writes across the cluster scale linearly with the 
replication factor.

2. Disk Read Throughput (worst case, cache hit = 0%)
Read_Throughput = Ingest_MBps × (Replication_Factor − 1 + 
Number_of_Consumer_Groups)
Explanation:
Leaders must read data from disk to:

  *   serve followers (RF − 1 times), and
  *   serve each consumer group (each group reads the full stream).
If pagecache misses are assumed (e.g., heavy lag), all of these reads hit disk, 
so the terms add up.

Are these calculations accurate for estimating cluster disk throughput under 
worst-case conditions?
Any corrections or recommendations would be appreciated.

Regards,
Prateek Kohli

Re: Validating Kafka Disk Throughput Formulas (Write & Read)

Reply via email to