Hi Chris,

your estimation looks correct to me.

I do not know how big M might be. Maybe the following link can help you with the estimation:

https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format

There are also some additional files that RocksDB keeps in its directory. I guess the best way to estimate the space is experimentally.

Also take into account that, you will have one state store per partition.

If you want to save disk space, you should try to use Leveled compaction (https://github.com/facebook/rocksdb/wiki/Leveled-Compaction) instead since it has a space amplification of 10% instead of 100% with Universal compaction. That is, you can replace the 2 in your formula with 1.1.

Since AK 2.7, you can also monitor the sizes of your RocksDB state stores with the metric total-sst-files-size (https://kafka.apache.org/documentation/#kafka_streams_rocksdb_monitoring)

Best,
Bruno

On 18.02.21 17:43, Chris Toomey wrote:
We're using RocksDB as a persistent Kafka state store for compacted topics
and need to be able to estimate the maximum disk space required.

We're using the default config. settings provided by Kafka, which include
Universal compaction, no compression, and 4k block size.

Given these settings and a topic w/ key size K, value size V, and number of
records R, I'd assume a rough disk space estimation would be of the form

max. disk space = (K+V)*R*M*2

where M is an unknown DB size -> disk size multiplier and *2 is to allow
for full compaction as per here
<https://github.com/facebook/rocksdb/wiki/Universal-Compaction>.

Does this look right, and can anyone provide a ballpark range for the
multiplier M and/or some guidelines for how to estimate it?

much thanks,
Chris

Reply via email to