The existing 15 node cluster had about 450-500GB/node, most in one TWCS
table. Data is applied with a 7-day TTL. Our cluster couldn't be expanded
due to a bit of political foot dragging and new load of about 2x-3x started
up around the time we started expanding.

about 500 sstables per node, with one outlier of 16,000 files (.Data.db to
be clear).

The 160000 data.db sstable files grew from 500 steadily over a week.
Probably compaction fell behind that was exacerbated by growing load, but
the sstable count growth appears to have started before the heaviest load
increases.

We attempted to expand figuring the cluster was under duress. The first
addition still had 150,000 files/25,000 Data.db files, and about 500 GB

three other nodes have started to gain in number of files as well.

Our last attempted expand filled a 2 terabyte disk and we ended up with
over 100,000 Data.db sstable files and 600000 files overall, and it hadn't
finished. We killed that node.

Wide rows do not appear to be a problem.

We are vertically scaling our nodes to bigger hardware and unthrottling
compaction and doubling compactors on the nodes that are starting to
inflate numbers of sstables, that appears to be helping.

But the overstreaming is still a mystery.

Table compaction settings:

) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'compaction_window_unit': 'HOURS',
'compaction_window_size': '4', 'class':
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Reply via email to