Hello Alain, thanks again for answering.
Yes, I believe during the next compaction following the expiration date, > the entry is 'transformed' into a tombstone, and lives in the SSTable that > is the result of the compaction, on the level/bucket this SSTable is put > into. > Great, however I'm still trying to figure it out a way to test this or see it in code. If you have any idea I could give it a try. I didn't understand what you mean with > generally, it's good if you can rotate the partitions over time, not to > reuse old partitions for example > About garbagecollect, it is a good idea but is not available in version 3.0.13. Again, I've asked this on stackoverflow ( https://stackoverflow.com/q/52370661/3517383) top, so, just if you want, you can answer there too and I will mark it as correct. Cheers. El jue., 27 sept. 2018 a las 14:11, Alain RODRIGUEZ (<arodr...@gmail.com>) escribió: > Hello Gabriel, > > Another clue to explore would be to use the TTL as a default value if >> that's a good fit. TTLs set at the table level with 'default_time_to_live' >> should not generate any tombstone at all in C*3.0+. Not tested on my hand, >> but I read about this. >> > > As explained on a parallel thread, this is wrong ^, mea culpa. I believe > the rest of my comment still stands (hopefully :)). > > I'm not sure what it means with "*in-place*" since SSTables are immutable. >> [...] > > My guess is that is referring to tombstones being created in the same >> level (but different SStables) that the TTLed data during a compaction >> triggered > > > Yes, I believe during the next compaction following the expiration date, > the entry is 'transformed' into a tombstone, and lives in the SSTable that > is the result of the compaction, on the level/bucket this SSTable is put > into. That's why I said 'in-place' which is indeed a bit weird for > immutable data. > > As a side idea for your problem, on 'modern' versions of Cassandra (I > don't remember the version, that's what 'modern' means ;-)), you can run > 'nodetool garbagecollect' regularly (not necessarily frequently) during the > off-peak period. That might use the cluster resources when you don't need > them to claim some disk space. Also making sure that a 2 years old record > is not being updated regularly by design would definitely help. In the > extreme case of writing a data once (never updated) and with a TTL for > example, I see no reason for a 2 years old data not to be evicted > correctly. As long as the disk can grow, it should be fine. > > I would not be too much scared about it, as there is 'always' a way to > remove tombstones. Yet it's good to think about the design beforehand > indeed, generally, it's good if you can rotate the partitions over time, > not to reuse old partitions for example. > > C*heers, > ----------------------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le mar. 25 sept. 2018 à 17:38, Gabriel Giussi <gabrielgiu...@gmail.com> a > écrit : > >> I'm using LCS and a relatively large TTL of 2 years for all inserted rows >> and I'm concerned about the moment at wich C* would drop the corresponding >> tombstones (neither explicit deletes nor updates are being performed). >> >> From [Missing Manual for Leveled Compaction Strategy]( >> https://www.youtube.com/watch?v=-5sNVvL8RwI), [Tombstone Compactions in >> Cassandra](https://www.youtube.com/watch?v=pher-9jqqC4) and [Deletes >> Without Tombstones or TTLs](https://www.youtube.com/watch?v=BhGkSnBZgJA) >> I understand that >> >> - All levels except L0 contain non-overlapping SSTables, but a partition >> key may be present in one SSTable in each level (aka distributed in all >> levels). >> - For a compaction to be able to drop a tombstone it must be sure that >> is compacting all SStables that contains de data to prevent zombie data >> (this is done checking bloom filters). It also considers gc_grace_seconds >> >> So, for my particular use case (2 years TTL and write heavy load) I can >> conclude that TTLed data will be in highest levels so I'm wondering when >> those SSTables with TTLed data will be compacted with the SSTables that >> contains the corresponding SSTables. >> The main question will be: **Where are tombstones (from ttls) being >> created? Are being created at Level 0 so it will take a long time until it >> will end up in the highest levels (hence disk space will take long time to >> be freed)?** >> >> In a comment from [About deletes and tombstones]( >> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html) >> Alain says that >> > Yet using TTLs helps, it reduces the chances of having data being >> fragmented between SSTables that will not be compacted together any time >> soon. Using any compaction strategy, if the delete comes relatively late in >> the row history, as it use to happen, the 'upsert'/'insert' of the >> tombstone will go to a new SSTable. It might take time for this tombstone >> to get to the right compaction "bucket" (with the rest of the row) and for >> Cassandra to be able to finally free space. >> **My understanding is that with TTLs the tombstones is created >> in-place**, thus it is often and for many reasons easier and safer to get >> rid of a TTLs than from a delete. >> Another clue to explore would be to use the TTL as a default value if >> that's a good fit. TTLs set at the table level with 'default_time_to_live' >> should not generate any tombstone at all in C*3.0+. Not tested on my hand, >> but I read about this. >> >> I'm not sure what it means with "*in-place*" since SSTables are >> immutable. >> (I also have some doubts about what it says of using >> `default_time_to_live` that I've asked in [How default_time_to_live would >> delete rows without tombstones in Cassandra?]( >> https://stackoverflow.com/q/52282517/3517383)). >> >> My guess is that is referring to tombstones being created in the same >> level (but different SStables) that the TTLed data during a compaction >> triggered by one of the following reasons: >> >> 1. "Going from highest level, any level having score higher than 1.001 >> can be picked by a compaction thread" [The Missing Manual for Leveled >> Compaction Strategy]( >> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117 >> ) >> 2. "If we go 25 rounds without compacting in the highest level, we start >> bringing in sstables from that level into lower level compactions" [The >> Missing Manual for Leveled Compaction Strategy]( >> https://image.slidesharecdn.com/csummit16lcstalk-161004232416/95/the-missing-manual-for-leveled-compaction-strategy-wei-deng-ryan-svihla-datastax-cassandra-summit-2016-12-638.jpg?cb=1475693117 >> ) >> 3. "When there are no other compactions to do, we trigger a >> single-sstable compaction if there is more than X% droppable tombstones in >> the sstable." [CASSANDRA-7019]( >> https://issues.apache.org/jira/browse/CASSANDRA-7019) >> Since tombstones are created during compaction, I think it may be using >> SSTable metadata to estimate droppable tombstones. >> >> **So, compactions (2) and (3) should be creating/dropping tombstones in >> highest levels hence using LCS with a large TTL should not be an issue per >> se.** >> With creating/dropping I mean that the same kind of compactions will be >> creating tombstones for expired data and/or dropping tombstones if the gc >> period has already passed. >> >> A link to source code that clarifies this situation will be great, thanks. >> >