Hello! Recently I have been trying to familiarize myself with Cassandra but
don't quite understand when data is removed from disk after it has been
deleted. The use case I'm particularly interested is expiring time series data
with DTCS. As an example, I created the following table:
CREATE TABLE metrics (
metric_id text,
time timestamp,
value double,
PRIMARY KEY (metric_id, time),
) WITH CLUSTERING ORDER BY (time DESC) AND
default_time_to_live = 86400 AND
gc_grace_seconds = 3600 AND
compaction = {
'class': 'DateTieredCompactionStrategy',
'timestamp_resolution':'MICROSECONDS',
'base_time_seconds':'3600',
'max_sstable_age_days':'365',
'min_threshold':'4'
};
I understand that Cassandra will create a tombstone for all rows inserted into
this table 24 hours after they are inserted (86400 seconds). These tombstones
will first be written to an in-memory Memtable and then flushed to disk as an
SSTable when the Memtable reaches a certain size. My question is when will the
data that is now expired be removed from disk? Is it the next time the SSTable
which contains the data gets compacted? So, with DTCS and min_threshold set to
four, we would wait until at least three other SSTables are in the same time
window as the expired data, and then those SSTables will be compacted into a
SSTable without the expired data. Is it only during this compaction that the
data will be removed? It seems to me that this would require Cassandra to
maintain some metadata on which rows have been deleted since the newer
tombstones would likely not be in the older SSTables that are being compacted.
Also, I'm aware that Cassandra can drop entire SSTables if they contain only
expired data but I'm unsure of what qualifies as expired data (is it just
SSTables whose maximum timestamp is past the default TTL for the table?) and
when such SSTables are dropped.
Alternatively, do the SSTables which contain the tombstones have to be
compacted with the SSTables which contain the expired data for the data to be
removed? It seems to me that this could result in Cassandra holding the expired
data long after it has expired since it's waiting for the new tombstones to be
compacted with the older expired data.
Finally, I was also unsure when the tombstones themselves are removed. I know
Cassandra does not delete them until after gc_grace_seconds but it can't delete
the tombstones until it's sure the expired data has been deleted right?
Otherwise it would see the expired data as being valid. Consequently, it seems
to me that the question of when tombstones are deleted is intimately tied to
the questions above.
Thanks in advance! If it helps I've been experimenting with version 2.0.15
myself.