Re: Tombstone removal optimization and question
Thanks for the confirmation Kurt Le 6 nov. 2018 11:59, "kurt greaves" a écrit : > Yes it does. Consider if it didn't and you kept writing to the same > partition, you'd never be able to remove any tombstones for that partition. > > On Tue., 6 Nov. 2018, 19:40 DuyHai Doan >> Hello all >> >> I have tried to sum up all rules related to tombstone removal: >> >> >> -- >> >> Given a tombstone written at timestamp (t) for a partition key (P) in >> SSTable (S1). This tombstone will be removed: >> >> 1) after gc_grace_seconds period has passed >> 2) at the next compaction round, if SSTable S1 is selected (not at all >> guaranteed because compaction is not deterministic) >> 3) if the partition key (P) is not present in any other SSTable that is >> NOT picked by the current round of compaction >> >> Rule 3) is quite complex to understand so here is the detailed >> explanation: >> >> If Partition Key (P) also exists in another SSTable (S2) that is NOT >> compacted together with SSTable (S1), if we remove the tombstone, there is >> some data in S2 that may resurrect. >> >> Precisely, at compaction time, Cassandra does not have ANY detail about >> Partition (P) that stays in S2 so it cannot remove the tombstone right away. >> >> Now, for each SSTable, we have some metadata, namely minTimestamp and >> maxTimestamp. >> >> I wonder if the current compaction optimization does use/leverage this >> metadata for tombstone removal. Indeed if we know that tombstone timestamp >> (t) < minTimestamp, it can be safely removed. >> >> Does someone has the info ? >> >> Regards >> >> >>
Re: Tombstone removal optimization and question
Yes it does. Consider if it didn't and you kept writing to the same partition, you'd never be able to remove any tombstones for that partition. On Tue., 6 Nov. 2018, 19:40 DuyHai Doan Hello all > > I have tried to sum up all rules related to tombstone removal: > > > -- > > Given a tombstone written at timestamp (t) for a partition key (P) in > SSTable (S1). This tombstone will be removed: > > 1) after gc_grace_seconds period has passed > 2) at the next compaction round, if SSTable S1 is selected (not at all > guaranteed because compaction is not deterministic) > 3) if the partition key (P) is not present in any other SSTable that is > NOT picked by the current round of compaction > > Rule 3) is quite complex to understand so here is the detailed explanation: > > If Partition Key (P) also exists in another SSTable (S2) that is NOT > compacted together with SSTable (S1), if we remove the tombstone, there is > some data in S2 that may resurrect. > > Precisely, at compaction time, Cassandra does not have ANY detail about > Partition (P) that stays in S2 so it cannot remove the tombstone right away. > > Now, for each SSTable, we have some metadata, namely minTimestamp and > maxTimestamp. > > I wonder if the current compaction optimization does use/leverage this > metadata for tombstone removal. Indeed if we know that tombstone timestamp > (t) < minTimestamp, it can be safely removed. > > Does someone has the info ? > > Regards > > >
Tombstone removal optimization and question
Hello all I have tried to sum up all rules related to tombstone removal: -- Given a tombstone written at timestamp (t) for a partition key (P) in SSTable (S1). This tombstone will be removed: 1) after gc_grace_seconds period has passed 2) at the next compaction round, if SSTable S1 is selected (not at all guaranteed because compaction is not deterministic) 3) if the partition key (P) is not present in any other SSTable that is NOT picked by the current round of compaction Rule 3) is quite complex to understand so here is the detailed explanation: If Partition Key (P) also exists in another SSTable (S2) that is NOT compacted together with SSTable (S1), if we remove the tombstone, there is some data in S2 that may resurrect. Precisely, at compaction time, Cassandra does not have ANY detail about Partition (P) that stays in S2 so it cannot remove the tombstone right away. Now, for each SSTable, we have some metadata, namely minTimestamp and maxTimestamp. I wonder if the current compaction optimization does use/leverage this metadata for tombstone removal. Indeed if we know that tombstone timestamp (t) < minTimestamp, it can be safely removed. Does someone has the info ? Regards