Re: Documentation about TTL and tombstones
> It's actually correct to do it how it is today. > Insertion date does not matter, what matters is the time after tombstones are > supposed to be deleted. > If the delete got to all nodes, sure, no problem, but if any of the nodes > didn't get the delete, and you would get rid of the tombstones before running > a repair, you might have nodes that still has that data. > Then following a repair, that data will be copied to other replicas, and that > data you thought you deleted, will be brought back to life. Sure, for regular data that does not have a TTL, this makes sense. But I claim that data with a TTL is deleted when it is inserted. It’s just that this delete only becomes effective at some future date. In order to understand whether data might reappear, we have to consider four cases. Let us first consider the three cases where the INSERT / UPDATE did not overwrite any existing data that would have lived longer than the new data: 1. Let us assume that the data is successfully written to all nodes and no repair is run. After the TTL expires, the data turns into a tombstone, but because the data was present on all nodes, the tombstone is present on all nodes, so there is no risk of data reappearing. 2. Let us assume that this data is not written to all nodes but a repair is run within the TTL. After that, we effectively have the first situation, so there is no risk of data reappearing. 3. Let us assume that this data is not written to all nodes and no repair is run within the TTL. After the TTL has passed, the data expires on the nodes where it has been written. Now, we have tombstones on these nodes. If we get rid of the tombstones, there is no risk of the data reappearing, because there are no nodes that have the data, so even if we run a repair in the future, there is no risk that the data magically reappears. Now, let us consider the cases where data that either had no TTL or had a TTL that expired after the TTL of the newly inserted data was overwritten. Again, there are three possible scenarios: 4. Let us assume that the data is successfully written to all nodes and no repair is run. After the TTL expires, the data turns into a tombstone, but because the data was present on all nodes, the tombstone is present on all nodes, so there is no risk of data reappearing. 5. Let us assume that this data is not written to all nodes but a repair is run within the TTL. After that, we effectively have the first situation, so there is no risk of data reappearing. 6. Let us assume that this data is not written to all nodes and no repair is run within the TTL. After the TTL has passed, the data expires on the nodes where it has been written. Now, we have tombstones on these nodes. If we get rid of the tombstones, there is the risk of the data reappearing, because the older data that was overwritten by the INSERT / UPDATE might still exist on some nodes, and as the data with the TTL never made it to these nodes, there is no tombstone on these nodes and thus the older data can reappear. So, we only have to worry about the last scenario. In this scenario, we have to ensure that either the inserted data with the TTL is repaired (which brings us back to scenario 5), or that the tombstones are repaired before they are discarded. This is why I claim that for data with a TTL, gc_grace_seconds should effectively start when the data is inserted, not when it is converted into a tombstone: It does not matter whether the data with the TTL is repaired or the tombstone is repaired. As long as either of these things between the data with the TTL being inserted and the tombstone being reclaimed, there is no risk of deleted or overwritten data reappearing. smime.p7s Description: S/MIME cryptographic signature
Re: Documentation about TTL and tombstones
It's actually correct to do it how it is today. Insertion date does not matter, what matters is the time after tombstones are supposed to be deleted. If the delete got to all nodes, sure, no problem, but if any of the nodes didn't get the delete, and you would get rid of the tombstones before running a repair, you might have nodes that still has that data. Then following a repair, that data will be copied to other replicas, and that data you thought you deleted, will be brought back to life. On Sat, Mar 16, 2024 at 5:39 PM Sebastian Marsching wrote: > > That's not how gc_grace_seconds work. > > gc_grace_seconds controls how much time *after* a tombstone can be > deleted, it can actually be deleted, in order to give you enough time to > run repairs. > > > > Say you have data that is about to expire on March 16 8am, and > gc_grace_seconds is 10 days. > > After Mar 16 8am that data will be a tombstone, and only after March 26 > 8am, a compaction *might* remove it, if all other conditions are met. > > You are right. I do not understand why it is implemented this way, but you > are 100 % correct that it works this way. > > I thought that gc_grace_seconds is all about being able to repair the > table before tombstones are removed, so that deleted data cannot repappear. > But when the data has a TTL, it should not matter whether the original data > ore the tombstone is synchronized as part of the repair process. After all, > the original data should turn into a tombstone, so if it was present on all > nodes, there is no risk of deleted data reappearing. Therefore, I think it > would make more sense to start gc_grace_seconds when the data is inserted / > updated. I don’t know why it was not implemented this way. > >
Re: Documentation about TTL and tombstones
> That's not how gc_grace_seconds work. > gc_grace_seconds controls how much time *after* a tombstone can be deleted, > it can actually be deleted, in order to give you enough time to run repairs. > > Say you have data that is about to expire on March 16 8am, and > gc_grace_seconds is 10 days. > After Mar 16 8am that data will be a tombstone, and only after March 26 8am, > a compaction *might* remove it, if all other conditions are met. You are right. I do not understand why it is implemented this way, but you are 100 % correct that it works this way. I thought that gc_grace_seconds is all about being able to repair the table before tombstones are removed, so that deleted data cannot repappear. But when the data has a TTL, it should not matter whether the original data ore the tombstone is synchronized as part of the repair process. After all, the original data should turn into a tombstone, so if it was present on all nodes, there is no risk of deleted data reappearing. Therefore, I think it would make more sense to start gc_grace_seconds when the data is inserted / updated. I don’t know why it was not implemented this way. smime.p7s Description: S/MIME cryptographic signature
Re: Documentation about TTL and tombstones
That's not how gc_grace_seconds work. gc_grace_seconds controls how much time *after* a tombstone can be deleted, it can actually be deleted, in order to give you enough time to run repairs. Say you have data that is about to expire on March 16 8am, and gc_grace_seconds is 10 days. After Mar 16 8am that data will be a tombstone, and only after March 26 8am, a compaction *might* remove it, if all other conditions are met. gil On Fri, Mar 15, 2024 at 12:58 AM Sebastian Marsching < sebast...@marsching.com> wrote: > > by reading the documentation about TTL > > https://cassandra.apache.org/doc/4.1/cassandra/operating/compaction/index.html#ttl > It mention that it creates a tombstone when data expired, how does it > possible without writing to the tombstone on the table ? I thought TTL > doesn't create tombstones since the ttl is present together with the write > time timestmap > at the row level > > > If you read carefully, you will notice that no tombstone is created and > instead the data is *converted* into a tombstone. So, after the TTL has > expired, the inserted data effectively acts as a tombstone. This is needed, > because the now expired data might hide older data that has not expired > yet. If the newer data was simply dropped after the TTL expired, older data > might reappear. > > If I understand it correctly, you can avoid data with a TTL being > converted into a tombstone by choosing a TTL that is greater than > gc_grace_seconds. Technically, the data is still going to be converted into > a tombstone when the TTL expires, but this tombstone will immediately be > eligible for garbage collection. > >
Re: Documentation about TTL and tombstones
> by reading the documentation about TTL > https://cassandra.apache.org/doc/4.1/cassandra/operating/compaction/index.html#ttl > It mention that it creates a tombstone when data expired, how does it > possible without writing to the tombstone on the table ? I thought TTL > doesn't create tombstones since the ttl is present together with the write > time timestmap > at the row level If you read carefully, you will notice that no tombstone is created and instead the data is *converted* into a tombstone. So, after the TTL has expired, the inserted data effectively acts as a tombstone. This is needed, because the now expired data might hide older data that has not expired yet. If the newer data was simply dropped after the TTL expired, older data might reappear. If I understand it correctly, you can avoid data with a TTL being converted into a tombstone by choosing a TTL that is greater than gc_grace_seconds. Technically, the data is still going to be converted into a tombstone when the TTL expires, but this tombstone will immediately be eligible for garbage collection. smime.p7s Description: S/MIME cryptographic signature
Documentation about TTL and tombstones
Hello community, by reading the documentation about TTL https://cassandra.apache.org/doc/4.1/cassandra/operating/compaction/index.html#ttl It mention that it creates a tombstone when data expired, how does it possible without writing to the tombstone on the table ? I thought TTL doesn't create tombstones since the ttl is present together with the write time timestmap at the row level Greetings Jean Carlo "The best way to predict the future is to invent it" Alan Kay