Hi:
We need some help on cassandra repair and compact for a table that uses TWCS.
We are running cassandra 4.0-rc1. A database called test_db, biggest table
"minute_rate", storing time-series data. It has the following configuration:
CREATE TABLE test_db.minute_rate (
market smallint,
sin bigint,
field smallint,
slot timestamp,
close frozen,
high frozen,
low frozen,
open frozen,
PRIMARY KEY ((market, sin, field), slot)
) WITH CLUSTERING ORDER BY (slot ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '4', 'compaction_window_unit': 'HOURS',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 604800
AND extensions = {}
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
minute_rate is configured to use TWCS. We can see that it creates about 8 data
files per day. Each data file is relatively small size around 2GB each. so
far so good.
However, when "nodetool compact test_db minute_rate" ran on the weekend, it
seems to consolidate all small data files into one big files. nodetool compact
after 2 or 3 weeks we ended up having bigger and bigger data files. eating up
all disk space on the machine.
>From what I understand about TWCS, cassandra will simply drop the data files
>when the records inside the data file is older than default_time_to_live of
>604800 (7 days). But somehow this is not what we are seeing. When I
>sstablemetadata the oldest data file (sample below), I can see tombstone drop
>time all got updated. Resulting in the data file never get removed.
This lead me to think I am configuring things the wrong way. So I want to know
when using TWCS, do we need to repair and compact?
I saw in cassandra-reaper (we use reaper for repair) it is configured to skip
TWCS.
Should I stop running "nodetool compact test_db minute_rate"? If without
"nodetool compact", will cassandra clean up the tombstoned data file?
Thanks,
Eric
# sstablemetadata na-6681303-big-Data.db
SSTable:
/var/lib/cassandra/data/test_db/minute_rate-d7955270f31d11ea88fabb8dcc37b800/na-6681303-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1622528285761740 (06/01/2021 06:18:05)
Maximum timestamp: 1624699757821614 (06/26/2021 09:29:17)
SSTable min local deletion time: 1624767191 (06/27/2021 04:13:11)
SSTable max local deletion time: 2147483647 (no tombstones)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.2753232939720765
TTL min: 0
TTL max: 604800 (7 days)
First token: -9223371506914883753 (3870:16432974:7)
Last token: 9223358359954725918 (505:23187788:6)
minClusteringValues: [2021-04-12T16:54:00.000Z]
maxClusteringValues: [2021-06-26T09:28:00.000Z]
Estimated droppable tombstones: 0.3683520383378445
SSTable Level: 0
Repaired at: 0
Pending repair: --
Replay positions covered: {}
totalColumnsSet: 17179881008
totalRows: 4413102532
Estimated tombstone drop times:
Drop Time| Count (%) Histogram
1624768080 (06/27/2021 04:28:00) | 2 ( 0)
1624771440 (06/27/2021 05:24:00) |98 ( 0)
1624777020 (06/27/2021 06:57:00) |85 ( 0)
1624781640 (06/27/2021 08:14:00) |74 ( 0)
1624786080 (06/27/2021 09:28:00) |74 ( 0)
1624790280 (06/27/2021 10:38:00) |66 ( 0)
1624794900 (06/27/2021 11:55:00) |87 ( 0)
1624800060 (06/27/2021 13:21:00) |83 ( 0)
1624804680 (06/27/2021 14:38:00) |108064 ( 0)
1624809180 (06/27/2021 15:53:00) |304148 ( 0)
1624812540 (06/27/2021 16:49:00) |133188 ( 0)
1624819440 (06/27/2021 18:44:00) |88 ( 0)
1624824060 (06/27/2021 20:01:00) |73 ( 0)
1624828080 (06/27/2021 21:08:00) |66 ( 0)
1624832520 (06/27/2021 22:22:00) | 1 ( 0)
1624835880 (06/27/2021 23:18:00) | 23578916 ( 0) o
1624839720 (06/28/2021 00:22:00) | 21783899 ( 0) o
1624843740 (06/28/2021 01:29:00) | 22758204 ( 0) o
1624848120 (06/28/2021 02:42:00) | 25237306 ( 0) o
1624853520 (06/28/2021 04:12:00) | 44003185 ( 0) O.
1624858080 (06/28/2021 05:28:00) | 145977595 ( 0) O
1624862460 (06/28/2021 06:41:00) | 331875915 ( 1) OOOo
1624866060 (06/28/2021 07:41:00) | 463284230 ( 2) .
1624869540 (06/28/2021 08:39:00) | 455732185 ( 2)
1624873140 (06/28/2021 09:39:00) |