Hi experts, I have a design issue here: We want to store bigger amounts of data (> 30mio rows containing blobs) which will be deleted depending on the type of data on a monthly base (not in the same order as the data entered the system). Some data would survive for two month only, other data for 3-5 years.
The choice now is to have one table only with TTL per partition and partitions per deletion month (when the data should be deleted) which will allow a single delete command, followed by a compaction or alternatively to have multiple tables (one per month when the deletion process would just drop the table). The logic to retrieve that data is per record, so we know both the retention period and the id (uuid) of the addressed record, so multiple tables can be handled. Since it would be one table per deletion month, I do not expect more than 1000-2000 tables, depending on the retention period of the data. The benefit creating multiple tables would be that there are no tombstones while more tables take more memory in the nodes. The one table approach would make the compaction process take longer and produce more I/O activity because the compaction would regenerate multiple tables internally. Any thoughts on this ? We want to use 9 nodes, cassandra 3.11 on Linux, total data amount expected ~15-20 TB. Thank you very much, Marcus Haarmann