The existing 15 node cluster had about 450-500GB/node, most in one TWCS table. Data is applied with a 7-day TTL. Our cluster couldn't be expanded due to a bit of political foot dragging and new load of about 2x-3x started up around the time we started expanding.
about 500 sstables per node, with one outlier of 16,000 files (.Data.db to be clear). The 160000 data.db sstable files grew from 500 steadily over a week. Probably compaction fell behind that was exacerbated by growing load, but the sstable count growth appears to have started before the heaviest load increases. We attempted to expand figuring the cluster was under duress. The first addition still had 150,000 files/25,000 Data.db files, and about 500 GB three other nodes have started to gain in number of files as well. Our last attempted expand filled a 2 terabyte disk and we ended up with over 100,000 Data.db sstable files and 600000 files overall, and it hadn't finished. We killed that node. Wide rows do not appear to be a problem. We are vertically scaling our nodes to bigger hardware and unthrottling compaction and doubling compactors on the nodes that are starting to inflate numbers of sstables, that appears to be helping. But the overstreaming is still a mystery. Table compaction settings: ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'compaction_window_unit': 'HOURS', 'compaction_window_size': '4', 'class': 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 0 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';