Re: [PR] CASSANDRA-20829 Secondary index implementations do not integrate with IndexGCTransaction when compaction contains fully expired SSTables [cassandra]

via GitHub Thu, 14 Aug 2025 03:16:48 -0700


smiklosovic commented on code in PR #4300:
URL: https://github.com/apache/cassandra/pull/4300#discussion_r2276131662



##########
src/java/org/apache/cassandra/db/compaction/CompactionTask.java:
##########
@@ -102,6 +110,43 @@ public boolean 
reduceScopeForLimitedSpace(Set<SSTableReader> nonExpiredSSTables,
         return false;
     }
 
+    private void 
maybeNotifyIndexersAboutRowsInFullyExpiredSSTables(Set<SSTableReader> 
fullyExpiredSSTables)
+    {
+        if (fullyExpiredSSTables.isEmpty())
+            return;
+
+        List<Index> indexes = cfs.indexManager.listIndexes()
+                                              .stream()
+                                              
.filter(Index::notifyIndexerAboutRowsInFullyExpiredSSTables)
+                                              .collect(Collectors.toList());
+
+        if (indexes.isEmpty())
+            return;
+
+        for (SSTableReader expiredSSTable : fullyExpiredSSTables)
+        {
+            expiredSSTable.getScanner()
+                          .forEachRemaining(partition ->
+                                            {
+                                                
partition.forEachRemaining(unfiltered -> {

Review Comment:
   To have one transaction per partition, we would need to first iterate over 
all rows and put them to list, then get list's size, then use this size while 
creating `IndexGCTransaction` for `versions` and then iterate over all such 
rows and call `transaction.onRowMerge` on them.
   
   But I am not sure about holding rows in a list while we iterating over them. 
This can use a lot of memory, basically we would hold whole partition until we 
can run transaction on that.
   
   `IndexGCTransaction` needs `versions` (currenty set to `1`) as we have "one 
row to merge".
   
   Then, it will initialize `rows` in IndexGCTransaction's `start`. The size of 
the rows array is derived from `versions`. Then it will go over rows in 
`commit` etc.
   
   However, to have individual rows in `rows` not null, it has to go over  
`onRowMerge` where it is "merging" rows. Then it will run it through 
`Rows.diff` and if builders are not null, it will populate `rows`
   
   Calling it like this did not help
   
   ````
   transaction.onRowMerge((Row) unfiltered);
   ````
   
   but this did the trick
   
   ````
   transaction.onRowMerge((Row) unfiltered, (Row) unfiltered);
   ````
   
   Basically we are "merging" row into itself and the diff is the row? Does 
this make sense? The test passes though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Re: [PR] CASSANDRA-20829 Secondary index implementations do not integrate with IndexGCTransaction when compaction contains fully expired SSTables [cassandra]

Reply via email to