Hi Prateek, You may find this blog on wrote on Kafka sizing useful:
https://www.instaclustr.com/blog/how-to-size-apache-kafka-clusters-for-tiered-storage-part-1/ With the associated calculator here: https://github.com/instaclustr/code-samples/tree/main/Kafka/TieredStorage This one in particular: https://github.com/instaclustr/code-samples/blob/main/Kafka/TieredStorage/kafka_calculator_graphs.html Just download and use locally with a browser. For your example you want “delayed consumers” only. Regards, Paul Brebner NetApp Instaclustr From: Prateek Kohli via users <[email protected]> Date: Monday, 1 December 2025 at 8:49 pm To: users <[email protected]> Cc: Prateek Kohli <[email protected]> Subject: Validating Kafka Disk Throughput Formulas (Write & Read) EXTERNAL EMAIL - USE CAUTION when clicking links or attachments Hi everyone, I’m working on capacity planning for Kafka and wanted to validate two formulas I’m using to estimate cluster-level disk throughput in a worst-case scenario (when all reads come from disk due to large consumer lag and replication lag). 1. Disk Write Throughput Write_Throughput = Ingest_MBps × Replication_Factor(3) Explanation: Every MB of data written to Kafka is stored on all replicas (leader + followers), so total disk writes across the cluster scale linearly with the replication factor. 2. Disk Read Throughput (worst case, cache hit = 0%) Read_Throughput = Ingest_MBps × (Replication_Factor − 1 + Number_of_Consumer_Groups) Explanation: Leaders must read data from disk to: * serve followers (RF − 1 times), and * serve each consumer group (each group reads the full stream). If pagecache misses are assumed (e.g., heavy lag), all of these reads hit disk, so the terms add up. Are these calculations accurate for estimating cluster disk throughput under worst-case conditions? Any corrections or recommendations would be appreciated. Regards, Prateek Kohli
