Thanks Bruno. On Thu, Feb 18, 2021 at 9:19 AM Bruno Cadonna <br...@confluent.io> wrote:
> Hi Chris, > > your estimation looks correct to me. > > I do not know how big M might be. Maybe the following link can help you > with the estimation: > > https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format > > There are also some additional files that RocksDB keeps in its > directory. I guess the best way to estimate the space is experimentally. > > Also take into account that, you will have one state store per partition. > > If you want to save disk space, you should try to use Leveled compaction > (https://github.com/facebook/rocksdb/wiki/Leveled-Compaction) instead > since it has a space amplification of 10% instead of 100% with Universal > compaction. That is, you can replace the 2 in your formula with 1.1. > > Since AK 2.7, you can also monitor the sizes of your RocksDB state > stores with the metric total-sst-files-size > (https://kafka.apache.org/documentation/#kafka_streams_rocksdb_monitoring) > > Best, > Bruno > > On 18.02.21 17:43, Chris Toomey wrote: > > We're using RocksDB as a persistent Kafka state store for compacted > topics > > and need to be able to estimate the maximum disk space required. > > > > We're using the default config. settings provided by Kafka, which include > > Universal compaction, no compression, and 4k block size. > > > > Given these settings and a topic w/ key size K, value size V, and number > of > > records R, I'd assume a rough disk space estimation would be of the form > > > > max. disk space = (K+V)*R*M*2 > > > > where M is an unknown DB size -> disk size multiplier and *2 is to allow > > for full compaction as per here > > <https://github.com/facebook/rocksdb/wiki/Universal-Compaction>. > > > > Does this look right, and can anyone provide a ballpark range for the > > multiplier M and/or some guidelines for how to estimate it? > > > > much thanks, > > Chris > > >