Hey Jeff, Do most of those behaviors apply to TWCS too?
-J On Fri, Jun 17, 2016 at 1:25 PM, Jeff Jirsa <[email protected]> wrote: > First, DTCS in 2.0.15 has some weird behaviors - > https://issues.apache.org/jira/browse/CASSANDRA-9572 . > > > > That said, some other general notes: > > > Data deleted by TTL isn’t the same as issuing a delete – each expiring > cell internally has a ttl/timestamp at which it will be converted into a > tombstone. There is no tombstone added to the memtable, or flushed to disk > – it just treats the expired cells as tombstones once they’re past that > timestamp. > > > > > Cassandra’s getFullyExpiredSSTables() will consider a table fully expired > if (and only if) all cells within that table are expired (current time > > max timestamp ) AND the sstable timestamps don’t overlap with others that > aren’t fully expired. Björn talks about this in > https://issues.apache.org/jira/browse/CASSANDRA-8243 - the intent here is > so that explicit deletes (which do create tombstones) won’t be GC’d from > an otherwise fully expired sstable if they’re covering data in a more > recent sstable – without this check, we could accidentally bring dead data > back to life. In an append only time series workload this would be unusual, > but not impossible. > > Unfortunately, read repairs (foreground/blocking, if you write with CL < > ALL and read with CL > ONE) will cause cells written with old timestamps to > be written into the newly flushed sstables, which creates sstables with > wide gaps between minTimestamp and maxTimestamp (you could have a read > repair pull data that is 23 hours old into a new sstable, and now that one > sstable spans 23 hours, and isn’t fully expired until the oldest data is 47 > hours old). There’s an open ticket ( > https://issues.apache.org/jira/browse/CASSANDRA-10496 ) meant to make > this behavior ‘better’ in the future by splitting those old read-repaired > cells from the newly flushed sstables. > > > > > I gave a talk on a lot of this behavior last year at Summit ( > http://www.slideshare.net/JeffJirsa1/cassandra-summit-2015-real-world-dtcs-for-operators > ) - if you’re running time series in production on DTCS, it’s worth a > glance. > > > > > > > > *From: *jerome <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, June 17, 2016 at 11:52 AM > *To: *"[email protected]" <[email protected]> > *Subject: *Understanding when Cassandra drops expired time series data > > > > Hello! Recently I have been trying to familiarize myself with Cassandra > but don't quite understand when data is removed from disk after it has been > deleted. The use case I'm particularly interested is expiring time series > data with DTCS. As an example, I created the following table: > > CREATE TABLE metrics ( > > metric_id text, > > time timestamp, > > value double, > > PRIMARY KEY (metric_id, time), > > ) WITH CLUSTERING ORDER BY (time DESC) AND > > default_time_to_live = 86400 AND > > gc_grace_seconds = 3600 AND > > compaction = { > > 'class': 'DateTieredCompactionStrategy', > > 'timestamp_resolution':'MICROSECONDS', > > 'base_time_seconds':'3600', > > 'max_sstable_age_days':'365', > > 'min_threshold':'4' > > }; > > I understand that Cassandra will create a tombstone for all rows inserted > into this table 24 hours after they are inserted (86400 seconds). These > tombstones will first be written to an in-memory Memtable and then flushed > to disk as an SSTable when the Memtable reaches a certain size. My question > is when will the data that is now expired be removed from disk? Is it the > next time the SSTable which contains the data gets compacted? So, with DTCS > and min_threshold set to four, we would wait until at least three other > SSTables are in the same time window as the expired data, and then those > SSTables will be compacted into a SSTable without the expired data. Is it > only during this compaction that the data will be removed? It seems to me > that this would require Cassandra to maintain some metadata on which rows > have been deleted since the newer tombstones would likely not be in the > older SSTables that are being compacted. Also, I'm aware that Cassandra can > drop entire SSTables if they contain only expired data but I'm unsure of > what qualifies as expired data (is it just SSTables whose maximum > timestamp is past the default TTL for the table?) and when such SSTables > are dropped. > > Alternatively, do the SSTables which contain the tombstones have to be > compacted with the SSTables which contain the expired data for the data to > be removed? It seems to me that this could result in Cassandra holding the > expired data long after it has expired since it's waiting for the new > tombstones to be compacted with the older expired data. > > Finally, I was also unsure when the tombstones themselves are removed. I > know Cassandra does not delete them until after gc_grace_seconds but it > can't delete the tombstones until it's sure the expired data has been > deleted right? Otherwise it would see the expired data as being valid. > Consequently, it seems to me that the question of when tombstones are > deleted is intimately tied to the questions above. > > Thanks in advance! If it helps I've been experimenting with version 2.0.15 > myself. >
