Re: How do TTLs generate tombstones
Thanks, We have turned off read repair, and read with consistency = one. This leaves repairs and old timestamps (generate by the client) as possible causes for the overlap. We are writing from Spark, and don't have NTP set up on the cluster - I think that was causing some of the issues, but we have fixed it, and the problem remains. It is hard for me to believe the C* repair has a bug, so before creating a JIRA, I would appreciate if you could take a look at the attached sstables (produced using sstablemetadata) from two different time points over the last 2 week (we ran compaction between). In both cases, there are sstables generated around 8 pm that span over very long time periods (sometimes over a day). We run repair daily at 8 pm. Cheers, Eugene On Wed, Oct 11, 2017 at 12:53 PM, Jeff Jirsawrote: > Anti-entropy repairs ("nodetool repair") and bootstrap/decom/removenode > should stream sections of (and/or possibly entire) sstables from one > replica to another. Assuming the original sstable was entirely contained in > a single time window, the resulting sstable fragment streamed to the > neighbor node will similarly be entirely contained within a single time > window, and will be joined with the sstables in that window. If you find > this isn't the case, open a JIRA, that's a bug (it was explicitly a design > goal of TWCS, as it was one of my biggest gripes with early versions of > DTCS). > > Read repairs, however, will pollute the memtable and cause overlaps. There > are two types of read repairs: > - Blocking read repair due to consistency level (read at quorum, and one > of the replicas is missing data, the coordinator will issue mutations to > the missing replica, which will go into the memtable and flush into the > newest time window). This can not be disabled (period), and is probably the > reason most people have overlaps (because people tend to read their writes > pretty quickly after writes in time series use cases, often before hints or > normal repair can be successful, especially in environments where nodes are > bounced often). > - Background read repair (tunable with the read_repair_chance and > dclocal_read_repair_chance table options), which is like blocking read > repair, but happens probabilistically (ie: there's a 1% chance on any read > that the coordinator will scan the partition and copy any missing data to > the replicas missing that data. Again, this goes to the memtable, and will > flush into the newest time window). > > There's a pretty good argument to be made against manual repairs if (and > only if) you only use TTLs, never explicitly delete data, and can tolerate > the business risk of losing two machines at a time (that is: in the very > very rare case that you somehow lose 2 machines before you can rebuild, > you'll lose some subset of data that never made it to the sole remaining > replica; is your business going to lose millions of dollars, or will you > just have a gap in an analytics dashboard somewhere that nobody's going to > worry about). > > - Jeff > > > On Wed, Oct 11, 2017 at 9:24 AM, Sumanth Pasupuleti < > spasupul...@netflix.com.invalid> wrote: > >> Hi Eugene, >> >> Common contributors to overlapping SSTables are >> 1. Hints >> 2. Repairs >> 3. New writes with old timestamps (should be rare but technically >> possible) >> >> I would not run repairs with TWCS - as you indicated, it is going to >> result in overlapping SSTables which impacts disk space and read latency >> since reads now have to encompass multiple SSTables. >> >> As for https://issues.apache.org/jira/browse/CASSANDRA-13418, I would >> not worry about data resurrection as long as all the writes carry TTL with >> them. >> >> We faced similar overlapping issues with TWCS (it wss due to >> dclocal_read_repair_chance) - we developed an SSTable tool that would give >> topN or bottomN keys in an SSTable based on writetime/deletion time - we >> used this to identify the specific keys responsible for overlap between >> SSTables. >> >> Thanks, >> Sumanth >> >> >> On Mon, Oct 9, 2017 at 6:36 PM, eugene miretsky < >> eugene.miret...@gmail.com> wrote: >> >>> Thanks Alain! >>> >>> We are using TWCS compaction, and I read your blog multiple times - it >>> was very useful, thanks! >>> >>> We are seeing a lot of overlapping SSTables, leading to a lot of >>> problems: (a) large number of tombstones read in queries, (b) high CPU >>> usage, (c) fairly long Young Gen GC collection (300ms) >>> >>> We have read_repair_change = 0, and unchecked_tombstone_compaction = >>> true, gc_grace_seconds = 3h, but we read and write with consistency = >>> 1. >>> >>> I'm suspecting the overlap is coming from either hinted handoff or a >>> repair job we run nightly. >>> >>> 1) Is running repair with TWCS recommended? It seems like it will >>> always create a neverending overlap (the repair SSTable will have data from >>> all 24 hours), an effect that seems to get amplified with anti-compaction.
Re: How do TTLs generate tombstones
Anti-entropy repairs ("nodetool repair") and bootstrap/decom/removenode should stream sections of (and/or possibly entire) sstables from one replica to another. Assuming the original sstable was entirely contained in a single time window, the resulting sstable fragment streamed to the neighbor node will similarly be entirely contained within a single time window, and will be joined with the sstables in that window. If you find this isn't the case, open a JIRA, that's a bug (it was explicitly a design goal of TWCS, as it was one of my biggest gripes with early versions of DTCS). Read repairs, however, will pollute the memtable and cause overlaps. There are two types of read repairs: - Blocking read repair due to consistency level (read at quorum, and one of the replicas is missing data, the coordinator will issue mutations to the missing replica, which will go into the memtable and flush into the newest time window). This can not be disabled (period), and is probably the reason most people have overlaps (because people tend to read their writes pretty quickly after writes in time series use cases, often before hints or normal repair can be successful, especially in environments where nodes are bounced often). - Background read repair (tunable with the read_repair_chance and dclocal_read_repair_chance table options), which is like blocking read repair, but happens probabilistically (ie: there's a 1% chance on any read that the coordinator will scan the partition and copy any missing data to the replicas missing that data. Again, this goes to the memtable, and will flush into the newest time window). There's a pretty good argument to be made against manual repairs if (and only if) you only use TTLs, never explicitly delete data, and can tolerate the business risk of losing two machines at a time (that is: in the very very rare case that you somehow lose 2 machines before you can rebuild, you'll lose some subset of data that never made it to the sole remaining replica; is your business going to lose millions of dollars, or will you just have a gap in an analytics dashboard somewhere that nobody's going to worry about). - Jeff On Wed, Oct 11, 2017 at 9:24 AM, Sumanth Pasupuleti < spasupul...@netflix.com.invalid> wrote: > Hi Eugene, > > Common contributors to overlapping SSTables are > 1. Hints > 2. Repairs > 3. New writes with old timestamps (should be rare but technically possible) > > I would not run repairs with TWCS - as you indicated, it is going to > result in overlapping SSTables which impacts disk space and read latency > since reads now have to encompass multiple SSTables. > > As for https://issues.apache.org/jira/browse/CASSANDRA-13418, I would not > worry about data resurrection as long as all the writes carry TTL with them. > > We faced similar overlapping issues with TWCS (it wss due to > dclocal_read_repair_chance) - we developed an SSTable tool that would give > topN or bottomN keys in an SSTable based on writetime/deletion time - we > used this to identify the specific keys responsible for overlap between > SSTables. > > Thanks, > Sumanth > > > On Mon, Oct 9, 2017 at 6:36 PM, eugene miretsky> wrote: > >> Thanks Alain! >> >> We are using TWCS compaction, and I read your blog multiple times - it >> was very useful, thanks! >> >> We are seeing a lot of overlapping SSTables, leading to a lot of >> problems: (a) large number of tombstones read in queries, (b) high CPU >> usage, (c) fairly long Young Gen GC collection (300ms) >> >> We have read_repair_change = 0, and unchecked_tombstone_compaction = >> true, gc_grace_seconds = 3h, but we read and write with consistency = >> 1. >> >> I'm suspecting the overlap is coming from either hinted handoff or a >> repair job we run nightly. >> >> 1) Is running repair with TWCS recommended? It seems like it will always >> create a neverending overlap (the repair SSTable will have data from all 24 >> hours), an effect that seems to get amplified with anti-compaction. >> 2) TWCS seems to introduce a tradeoff between eventual consistency and >> write/read availability. If all repairs are turned off, then the choice is >> either (a) user strong consistency level, and pay the price of lower >> availability and slowers reads or writes, or (b) use lower consistency >> level, and risk inconsistent data (data is never repaired) >> >> I will try your last link but reappearing data sound a bit scary :) >> >> Any advice on how to debug this further would be greatly apprecaited. >> >> Cheers, >> Eugene >> >> On Fri, Oct 6, 2017 at 11:02 AM, Alain RODRIGUEZ >> wrote: >> >>> Hi Eugene, >>> >>> If we never use updates (time series data), is it safe to set gc_grace_seconds=0. >>> >>> >>> As Kurt pointed, you never want 'gc_grace_seconds' to be lower than >>> 'max_hint_window_in_ms' as the min off these 2 values is used for hints >>> storage window size in Apache Cassandra. >>> >>> Yet time series data with fixed TTLs
Re: How do TTLs generate tombstones
Hi Eugene, Common contributors to overlapping SSTables are 1. Hints 2. Repairs 3. New writes with old timestamps (should be rare but technically possible) I would not run repairs with TWCS - as you indicated, it is going to result in overlapping SSTables which impacts disk space and read latency since reads now have to encompass multiple SSTables. As for https://issues.apache.org/jira/browse/CASSANDRA-13418, I would not worry about data resurrection as long as all the writes carry TTL with them. We faced similar overlapping issues with TWCS (it wss due to dclocal_read_repair_chance) - we developed an SSTable tool that would give topN or bottomN keys in an SSTable based on writetime/deletion time - we used this to identify the specific keys responsible for overlap between SSTables. Thanks, Sumanth On Mon, Oct 9, 2017 at 6:36 PM, eugene miretskywrote: > Thanks Alain! > > We are using TWCS compaction, and I read your blog multiple times - it was > very useful, thanks! > > We are seeing a lot of overlapping SSTables, leading to a lot of problems: > (a) large number of tombstones read in queries, (b) high CPU usage, (c) > fairly long Young Gen GC collection (300ms) > > We have read_repair_change = 0, and unchecked_tombstone_compaction = > true, gc_grace_seconds = 3h, but we read and write with consistency = 1. > > I'm suspecting the overlap is coming from either hinted handoff or a > repair job we run nightly. > > 1) Is running repair with TWCS recommended? It seems like it will always > create a neverending overlap (the repair SSTable will have data from all 24 > hours), an effect that seems to get amplified with anti-compaction. > 2) TWCS seems to introduce a tradeoff between eventual consistency and > write/read availability. If all repairs are turned off, then the choice is > either (a) user strong consistency level, and pay the price of lower > availability and slowers reads or writes, or (b) use lower consistency > level, and risk inconsistent data (data is never repaired) > > I will try your last link but reappearing data sound a bit scary :) > > Any advice on how to debug this further would be greatly apprecaited. > > Cheers, > Eugene > > On Fri, Oct 6, 2017 at 11:02 AM, Alain RODRIGUEZ > wrote: > >> Hi Eugene, >> >> If we never use updates (time series data), is it safe to set >>> gc_grace_seconds=0. >> >> >> As Kurt pointed, you never want 'gc_grace_seconds' to be lower than >> 'max_hint_window_in_ms' as the min off these 2 values is used for hints >> storage window size in Apache Cassandra. >> >> Yet time series data with fixed TTLs allows a very efficient use of >> Cassandra, specially when using Time Window Compaction Strategy (TWCS). >> Funny fact is that Jeff brought it to Apache Cassandra :-). I would >> definitely give it a try. >> >> Here is a post from my colleague Alex that I believe could be useful in >> your case: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html >> >> Using TWCS and setting and lowering 'gc_grace_seconds' to the value of >> 'max_hint_window_in_ms' should be really effective. Make sure to use a >> strong consistency level (generally RF = 3, CL.Read = CL.Write = >> LOCAL_QUORUM) to prevent inconsistencies I would say (and depending on your >> interest in consistency). >> >> This way you could expire entires SSTables, without compaction. If >> overlaps in SSTables become a problem, you could even consider to give a >> try to a more aggressive SSTable expiration >> https://issues.apache.org/jira/browse/CASSANDRA-13418. >> >> C*heers, >> --- >> Alain Rodriguez - @arodream - al...@thelastpickle.com >> France / Spain >> >> The Last Pickle - Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> >> >> 2017-10-05 23:44 GMT+01:00 kurt greaves : >> >>> No it's never safe to set it to 0 as you'll disable hinted handoff for >>> the table. If you are never doing updates and manual deletes and you always >>> insert with a ttl you can get away with setting it to the hinted handoff >>> period. >>> >>> On 6 Oct. 2017 1:28 am, "eugene miretsky" >>> wrote: >>> Thanks Jeff, Make sense. If we never use updates (time series data), is it safe to set gc_grace_seconds=0. On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsa wrote: > > The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to > TTL'd cells, because even though the data is TTL'd, it may have been > written on top of another live cell that wasn't ttl'd: > > Imagine a test table, simple key->value (k, v). > > INSERT INTO table(k,v) values(1,1); > Kill 1 of the 3 nodes > UPDATE table USING TTL 60 SET v=1 WHERE k=1 ; > 60 seconds later, the live nodes will see that data as deleted, but > when that dead node comes back to life, it needs to learn of the deletion. > >
Re: How do TTLs generate tombstones
Thanks Alain! We are using TWCS compaction, and I read your blog multiple times - it was very useful, thanks! We are seeing a lot of overlapping SSTables, leading to a lot of problems: (a) large number of tombstones read in queries, (b) high CPU usage, (c) fairly long Young Gen GC collection (300ms) We have read_repair_change = 0, and unchecked_tombstone_compaction = true, gc_grace_seconds = 3h, but we read and write with consistency = 1. I'm suspecting the overlap is coming from either hinted handoff or a repair job we run nightly. 1) Is running repair with TWCS recommended? It seems like it will always create a neverending overlap (the repair SSTable will have data from all 24 hours), an effect that seems to get amplified with anti-compaction. 2) TWCS seems to introduce a tradeoff between eventual consistency and write/read availability. If all repairs are turned off, then the choice is either (a) user strong consistency level, and pay the price of lower availability and slowers reads or writes, or (b) use lower consistency level, and risk inconsistent data (data is never repaired) I will try your last link but reappearing data sound a bit scary :) Any advice on how to debug this further would be greatly apprecaited. Cheers, Eugene On Fri, Oct 6, 2017 at 11:02 AM, Alain RODRIGUEZwrote: > Hi Eugene, > > If we never use updates (time series data), is it safe to set >> gc_grace_seconds=0. > > > As Kurt pointed, you never want 'gc_grace_seconds' to be lower than > 'max_hint_window_in_ms' as the min off these 2 values is used for hints > storage window size in Apache Cassandra. > > Yet time series data with fixed TTLs allows a very efficient use of > Cassandra, specially when using Time Window Compaction Strategy (TWCS). > Funny fact is that Jeff brought it to Apache Cassandra :-). I would > definitely give it a try. > > Here is a post from my colleague Alex that I believe could be useful in > your case: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html > > Using TWCS and setting and lowering 'gc_grace_seconds' to the value of > 'max_hint_window_in_ms' should be really effective. Make sure to use a > strong consistency level (generally RF = 3, CL.Read = CL.Write = > LOCAL_QUORUM) to prevent inconsistencies I would say (and depending on your > interest in consistency). > > This way you could expire entires SSTables, without compaction. If > overlaps in SSTables become a problem, you could even consider to give a > try to a more aggressive SSTable expiration https://issues.apache.org/ > jira/browse/CASSANDRA-13418. > > C*heers, > --- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > > 2017-10-05 23:44 GMT+01:00 kurt greaves : > >> No it's never safe to set it to 0 as you'll disable hinted handoff for >> the table. If you are never doing updates and manual deletes and you always >> insert with a ttl you can get away with setting it to the hinted handoff >> period. >> >> On 6 Oct. 2017 1:28 am, "eugene miretsky" >> wrote: >> >>> Thanks Jeff, >>> >>> Make sense. >>> If we never use updates (time series data), is it safe to set >>> gc_grace_seconds=0. >>> >>> >>> >>> On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsa wrote: >>> The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to TTL'd cells, because even though the data is TTL'd, it may have been written on top of another live cell that wasn't ttl'd: Imagine a test table, simple key->value (k, v). INSERT INTO table(k,v) values(1,1); Kill 1 of the 3 nodes UPDATE table USING TTL 60 SET v=1 WHERE k=1 ; 60 seconds later, the live nodes will see that data as deleted, but when that dead node comes back to life, it needs to learn of the deletion. On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky < eugene.miret...@gmail.com> wrote: > Hello, > > The following link says that TTLs generate tombstones - > https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. > > What exactly is the process that converts the TTL into a tombstone? > >1. Is an actual new tombstone cell created when the TTL expires? >2. Or, is the TTLed cell treated as a tombstone? > > > Also, does gc_grace_period have an effect on TTLed cells? > gc_grace_period is meant to protect from deleted data re-appearing if the > tombstone is compacted away before all nodes have reached a consistent > state. However, since the ttl is stored in the cell (in liveness_info), > there is no way for the cell to re-appear (the ttl will still be there) > > Cheers, > Eugene > > >>> >
Re: How do TTLs generate tombstones
Hi Eugene, If we never use updates (time series data), is it safe to set > gc_grace_seconds=0. As Kurt pointed, you never want 'gc_grace_seconds' to be lower than 'max_hint_window_in_ms' as the min off these 2 values is used for hints storage window size in Apache Cassandra. Yet time series data with fixed TTLs allows a very efficient use of Cassandra, specially when using Time Window Compaction Strategy (TWCS). Funny fact is that Jeff brought it to Apache Cassandra :-). I would definitely give it a try. Here is a post from my colleague Alex that I believe could be useful in your case: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html Using TWCS and setting and lowering 'gc_grace_seconds' to the value of 'max_hint_window_in_ms' should be really effective. Make sure to use a strong consistency level (generally RF = 3, CL.Read = CL.Write = LOCAL_QUORUM) to prevent inconsistencies I would say (and depending on your interest in consistency). This way you could expire entires SSTables, without compaction. If overlaps in SSTables become a problem, you could even consider to give a try to a more aggressive SSTable expiration https://issues.apache.org/jira/browse/CASSANDRA-13418. C*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2017-10-05 23:44 GMT+01:00 kurt greaves: > No it's never safe to set it to 0 as you'll disable hinted handoff for the > table. If you are never doing updates and manual deletes and you always > insert with a ttl you can get away with setting it to the hinted handoff > period. > > On 6 Oct. 2017 1:28 am, "eugene miretsky" > wrote: > >> Thanks Jeff, >> >> Make sense. >> If we never use updates (time series data), is it safe to set >> gc_grace_seconds=0. >> >> >> >> On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsa wrote: >> >>> >>> The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to >>> TTL'd cells, because even though the data is TTL'd, it may have been >>> written on top of another live cell that wasn't ttl'd: >>> >>> Imagine a test table, simple key->value (k, v). >>> >>> INSERT INTO table(k,v) values(1,1); >>> Kill 1 of the 3 nodes >>> UPDATE table USING TTL 60 SET v=1 WHERE k=1 ; >>> 60 seconds later, the live nodes will see that data as deleted, but when >>> that dead node comes back to life, it needs to learn of the deletion. >>> >>> >>> >>> On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky < >>> eugene.miret...@gmail.com> wrote: >>> Hello, The following link says that TTLs generate tombstones - https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. What exactly is the process that converts the TTL into a tombstone? 1. Is an actual new tombstone cell created when the TTL expires? 2. Or, is the TTLed cell treated as a tombstone? Also, does gc_grace_period have an effect on TTLed cells? gc_grace_period is meant to protect from deleted data re-appearing if the tombstone is compacted away before all nodes have reached a consistent state. However, since the ttl is stored in the cell (in liveness_info), there is no way for the cell to re-appear (the ttl will still be there) Cheers, Eugene >>> >>
Re: How do TTLs generate tombstones
No it's never safe to set it to 0 as you'll disable hinted handoff for the table. If you are never doing updates and manual deletes and you always insert with a ttl you can get away with setting it to the hinted handoff period. On 6 Oct. 2017 1:28 am, "eugene miretsky"wrote: > Thanks Jeff, > > Make sense. > If we never use updates (time series data), is it safe to set > gc_grace_seconds=0. > > > > On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsa wrote: > >> >> The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to >> TTL'd cells, because even though the data is TTL'd, it may have been >> written on top of another live cell that wasn't ttl'd: >> >> Imagine a test table, simple key->value (k, v). >> >> INSERT INTO table(k,v) values(1,1); >> Kill 1 of the 3 nodes >> UPDATE table USING TTL 60 SET v=1 WHERE k=1 ; >> 60 seconds later, the live nodes will see that data as deleted, but when >> that dead node comes back to life, it needs to learn of the deletion. >> >> >> >> On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky < >> eugene.miret...@gmail.com> wrote: >> >>> Hello, >>> >>> The following link says that TTLs generate tombstones - >>> https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. >>> >>> What exactly is the process that converts the TTL into a tombstone? >>> >>>1. Is an actual new tombstone cell created when the TTL expires? >>>2. Or, is the TTLed cell treated as a tombstone? >>> >>> >>> Also, does gc_grace_period have an effect on TTLed cells? >>> gc_grace_period is meant to protect from deleted data re-appearing if the >>> tombstone is compacted away before all nodes have reached a consistent >>> state. However, since the ttl is stored in the cell (in liveness_info), >>> there is no way for the cell to re-appear (the ttl will still be there) >>> >>> Cheers, >>> Eugene >>> >>> >> >
Re: How do TTLs generate tombstones
Thanks Jeff, Make sense. If we never use updates (time series data), is it safe to set gc_grace_seconds=0. On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsawrote: > > The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to > TTL'd cells, because even though the data is TTL'd, it may have been > written on top of another live cell that wasn't ttl'd: > > Imagine a test table, simple key->value (k, v). > > INSERT INTO table(k,v) values(1,1); > Kill 1 of the 3 nodes > UPDATE table USING TTL 60 SET v=1 WHERE k=1 ; > 60 seconds later, the live nodes will see that data as deleted, but when > that dead node comes back to life, it needs to learn of the deletion. > > > > On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky > wrote: > >> Hello, >> >> The following link says that TTLs generate tombstones - >> https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. >> >> What exactly is the process that converts the TTL into a tombstone? >> >>1. Is an actual new tombstone cell created when the TTL expires? >>2. Or, is the TTLed cell treated as a tombstone? >> >> >> Also, does gc_grace_period have an effect on TTLed cells? gc_grace_period >> is meant to protect from deleted data re-appearing if the tombstone is >> compacted away before all nodes have reached a consistent state. However, >> since the ttl is stored in the cell (in liveness_info), there is no way for >> the cell to re-appear (the ttl will still be there) >> >> Cheers, >> Eugene >> >> >
Re: How do TTLs generate tombstones
The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to TTL'd cells, because even though the data is TTL'd, it may have been written on top of another live cell that wasn't ttl'd: Imagine a test table, simple key->value (k, v). INSERT INTO table(k,v) values(1,1); Kill 1 of the 3 nodes UPDATE table USING TTL 60 SET v=1 WHERE k=1 ; 60 seconds later, the live nodes will see that data as deleted, but when that dead node comes back to life, it needs to learn of the deletion. On Wed, Oct 4, 2017 at 2:05 PM, eugene miretskywrote: > Hello, > > The following link says that TTLs generate tombstones - > https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. > > What exactly is the process that converts the TTL into a tombstone? > >1. Is an actual new tombstone cell created when the TTL expires? >2. Or, is the TTLed cell treated as a tombstone? > > > Also, does gc_grace_period have an effect on TTLed cells? gc_grace_period > is meant to protect from deleted data re-appearing if the tombstone is > compacted away before all nodes have reached a consistent state. However, > since the ttl is stored in the cell (in liveness_info), there is no way for > the cell to re-appear (the ttl will still be there) > > Cheers, > Eugene > >
How do TTLs generate tombstones
Hello, The following link says that TTLs generate tombstones - https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. What exactly is the process that converts the TTL into a tombstone? 1. Is an actual new tombstone cell created when the TTL expires? 2. Or, is the TTLed cell treated as a tombstone? Also, does gc_grace_period have an effect on TTLed cells? gc_grace_period is meant to protect from deleted data re-appearing if the tombstone is compacted away before all nodes have reached a consistent state. However, since the ttl is stored in the cell (in liveness_info), there is no way for the cell to re-appear (the ttl will still be there) Cheers, Eugene