Copilot commented on code in PR #4519:
URL: https://github.com/apache/cassandra/pull/4519#discussion_r2611910464
##########
doc/modules/cassandra/pages/managing/operating/compaction/tombstones.adoc:
##########
@@ -124,6 +125,67 @@ To avoid keeping tombstones forever, we set
`gc_grace_seconds` for every table i
If an SSTable contains only tombstones and it is guaranteed that SSTable is
not shadowing data in any other SSTable, then the compaction can drop
that SSTable.
-If you see SSTables with only tombstones (note that TTL'd data is considered
tombstones once the time-to-live has expired), but it is not being dropped by
compaction, it is likely that other SSTables contain older data.
+If you observe SSTables that contain only tombstones or expired TTL data, and
compaction is not removing them, it likely indicates that older versions of the
data still exist in other SSTables.
There is a tool called `sstableexpiredblockers` that will list which SSTables
are droppable and which are blocking them from being dropped.
With `TimeWindowCompactionStrategy` it is possible to remove the guarantee
(not check for shadowing data) by enabling
`unsafe_aggressive_sstable_expiration`.
+
+
+== Examples
+
+Below is the sstabledump output showing a live row with expired flag as
"false":
+[source,json]
+----
+{
+ "partition" : {
+ "key" : [ "4ac48f1d-c736-4ca5-9b03-2566150f6af5" ],
+ "position" : 0
+ },
+ "rows" : [
+ {
+ "type" : "row",
+ "position" : 36,
+ "clustering" : [ "2025-11-05T22:43:20.833Z" ],
+ "liveness_info" : { "tsamp" : "2025-11-05T22:43:20.8326Z", "ttl" :
86400, "expires_at" : "2025-11-06T22:43:20Z", "expired" : false },
+ "cells" : [
+ { "name" : "activity_details", "value" : "details_for_row_3099" },
+ { "name" : "activity_type", "value" : "type_7" }
+ ]
+ }
+ ]
+}
+----
+
+Below is the sstabledump output showing expired flag as "true" when TTL has
expired (and is considered as tombstone when read):
+[source,json]
+----
+{
+ "partition" : {
+ "key" : [ "4ac48f1d-c736-4ca5-9b03-2566150f6af5" ],
+ "position" : 0
+ },
+ "rows" : [
+ {
+ "type" : "row",
+ "position" : 30,
+ "clustering" : [ "2025-11-05T22:43:20.833Z" ],
+ "liveness_info" : { "tsamp" : "2025-11-05T22:43:20.832650Z", "ttl" :
86400, "expires_at" : "2025-11-06T22:43:20Z", "expired" : true },
Review Comment:
There is a typo in the "tsamp" field value. The correct field name should be
"tstamp" (timestamp) rather than "tsamp". This appears to be a typographical
error in the example output.
##########
doc/modules/cassandra/pages/managing/operating/compaction/tombstones.adoc:
##########
@@ -124,6 +125,67 @@ To avoid keeping tombstones forever, we set
`gc_grace_seconds` for every table i
If an SSTable contains only tombstones and it is guaranteed that SSTable is
not shadowing data in any other SSTable, then the compaction can drop
that SSTable.
-If you see SSTables with only tombstones (note that TTL'd data is considered
tombstones once the time-to-live has expired), but it is not being dropped by
compaction, it is likely that other SSTables contain older data.
+If you observe SSTables that contain only tombstones or expired TTL data, and
compaction is not removing them, it likely indicates that older versions of the
data still exist in other SSTables.
There is a tool called `sstableexpiredblockers` that will list which SSTables
are droppable and which are blocking them from being dropped.
With `TimeWindowCompactionStrategy` it is possible to remove the guarantee
(not check for shadowing data) by enabling
`unsafe_aggressive_sstable_expiration`.
+
+
+== Examples
+
+Below is the sstabledump output showing a live row with expired flag as
"false":
+[source,json]
+----
+{
+ "partition" : {
+ "key" : [ "4ac48f1d-c736-4ca5-9b03-2566150f6af5" ],
+ "position" : 0
+ },
+ "rows" : [
+ {
+ "type" : "row",
+ "position" : 36,
+ "clustering" : [ "2025-11-05T22:43:20.833Z" ],
+ "liveness_info" : { "tsamp" : "2025-11-05T22:43:20.8326Z", "ttl" :
86400, "expires_at" : "2025-11-06T22:43:20Z", "expired" : false },
Review Comment:
There is a typo in the "tsamp" field value. The correct field name should be
"tstamp" (timestamp) rather than "tsamp". This appears to be a typographical
error in the example output.
##########
doc/modules/cassandra/pages/managing/operating/compaction/tombstones.adoc:
##########
@@ -4,17 +4,18 @@
== What are tombstones?
-{cassandra}'s processes for deleting data are designed to improve performance,
and to work with {cassandra}'s built-in properties for data distribution and
fault-tolerance.
+{cassandra}'s processes for deleting data are designed to be efficient, and to
work with {cassandra}'s native features for data distribution and
fault-tolerance.
{cassandra} treats a deletion as an insertion, and inserts a time-stamped
deletion marker called a tombstone.
The tombstones go through {cassandra}'s write path, and are written to
SSTables on one or more nodes.
The key feature difference of a tombstone is that it has a built-in expiration
date/time.
-At the end of its expiration period, the grace period, the tombstone is
deleted as part of {cassandra}'s normal compaction process.
+At the end of its expiration period, called the grace period, the tombstone is
deleted as part of {cassandra}'s normal compaction process.
[NOTE]
====
-You can also mark a {cassandra} row or column with a time-to-live (TTL) value.
-After this amount of time has ended, {cassandra} marks the object with a
tombstone, and handles it like other tombstoned objects.
+In {cassandra}, you can assign a time-to-live (TTL) to a row or column. Once
the TTL expires, the data is eligible for removal.
+During compaction, if the `gc_grace_seconds` period is still active,
{cassandra} marks the data as expired, handling it like any other deleted item.
+After `gc_grace_seconds` has elapsed, the data is eligible for permanent
removal.
Review Comment:
The explanation of TTL behavior is incomplete and potentially confusing. The
note states that data is "marked as expired" during compaction if
gc_grace_seconds is still active, but the actual behavior is that TTL-expired
data becomes a tombstone immediately upon expiry (at read time), not during
compaction. The gc_grace_seconds determines when the tombstone can be
permanently removed during compaction, not when the data is marked as expired.
Consider clarifying that TTL expiration creates an implicit tombstone
immediately, and gc_grace_seconds controls when that tombstone can be purged.
```suggestion
In {cassandra}, you can assign a time-to-live (TTL) to a row or column. Once
the TTL expires, the data is immediately considered deleted—an implicit
tombstone is created at read time, and queries will no longer return the
expired data.
However, the expired data (now a tombstone) is not physically removed from
storage until after the `gc_grace_seconds` period has elapsed and a compaction
occurs. The `gc_grace_seconds` setting determines how long tombstones
(including those created by TTL expiry) are retained to ensure all replicas
have a chance to learn about the deletion.
After `gc_grace_seconds` has elapsed, the tombstone is eligible for
permanent removal during compaction.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]