blambov commented on code in PR #4300:
URL: https://github.com/apache/cassandra/pull/4300#discussion_r2275893538


##########
src/java/org/apache/cassandra/db/compaction/CompactionTask.java:
##########
@@ -171,7 +177,45 @@ public boolean apply(SSTableReader sstable)
             long inputSizeBytes;
             long timeSpentWritingKeys;
 
+            List<Index> indexesReceptiveToFullyExpiredSSTables = 
cfs.indexManager.listIndexes()
+                                                                               
  .stream()
+                                                                               
  .filter(Index::includeFullyExpiredTablesInCompaction)
+                                                                               
  .collect(Collectors.toList());
+
+            if (!indexesReceptiveToFullyExpiredSSTables.isEmpty() && 
!fullyExpiredSSTables.isEmpty())
+            {
+                try (WriteContext ctx = 
cfs.keyspace.getWriteHandler().createContextForIndexing())
+                {
+                    for (SSTableReader expiredSSTable : fullyExpiredSSTables)
+                    {
+                        expiredSSTable.getScanner()
+                                      .forEachRemaining(partition ->
+                                                        
partition.forEachRemaining(unfiltered -> {
+                                                            if (unfiltered 
instanceof Row)
+                                                            {
+                                                                for (Index 
index : indexesReceptiveToFullyExpiredSSTables)

Review Comment:
   Can we use `indexManager.newCompactionTransaction` created once for every 
partition?
   
   This will also use indexes that don't set 
`includeFullyExpiredTablesInCompaction`, which shouldn't be an issue. If we do 
that, we can change the index collection to just a check if one of the indexes 
needs it.



##########
src/java/org/apache/cassandra/db/compaction/CompactionTask.java:
##########
@@ -171,7 +177,45 @@ public boolean apply(SSTableReader sstable)
             long inputSizeBytes;
             long timeSpentWritingKeys;
 
+            List<Index> indexesReceptiveToFullyExpiredSSTables = 
cfs.indexManager.listIndexes()

Review Comment:
   Can we move the expired sstables processing to a separate method?



##########
src/java/org/apache/cassandra/db/compaction/CompactionTask.java:
##########
@@ -171,7 +177,45 @@ public boolean apply(SSTableReader sstable)
             long inputSizeBytes;
             long timeSpentWritingKeys;
 
+            List<Index> indexesReceptiveToFullyExpiredSSTables = 
cfs.indexManager.listIndexes()
+                                                                               
  .stream()
+                                                                               
  .filter(Index::includeFullyExpiredTablesInCompaction)
+                                                                               
  .collect(Collectors.toList());
+
+            if (!indexesReceptiveToFullyExpiredSSTables.isEmpty() && 
!fullyExpiredSSTables.isEmpty())

Review Comment:
   
   Nit: I'd check `fullyExpiredSSTables` first to avoid going through the 
indexes if it is empty.



##########
src/java/org/apache/cassandra/index/Index.java:
##########
@@ -422,6 +422,27 @@ default SSTableFlushObserver getFlushObserver(Descriptor 
descriptor, OperationTy
      */
     public void validate(PartitionUpdate update) throws 
InvalidRequestException;
 
+
+    /**
+     * When a secondary index is created on a column for a table with e.g. 
TWCS strategy,
+     * when this table contains SSTables which are evaluated as fully expired 
upon compaction,
+     * they are by default filtered out as they can be dropped in their 
entirety. However, once dropped like that,
+     * the index implementation is not notified about this fact via 
IndexGCTransaction as compaction on
+     * non-fully expired tables would do. This in turn means that custom index 
will never know that some data have
+     * been removed hence data custom index implementation is responsible for 
will grow beyond any limit.
+     *
+     * Override this method and return true in index implementation only in 
case you want to be notified about
+     * dropped fully-expired data. This will eventually mean that {@link 
Indexer#removeRow(Row)} will be called
+     * for rows contained in fully expired table. Return false if you do not 
want to be notified like this and
+     * your index is managing fully expired data in other way.
+     *
+     * @return true when fully expired tables should be included in compaction 
process, false otherwise.
+     */
+    public default boolean includeFullyExpiredTablesInCompaction()
+    {
+        return false;

Review Comment:
   This should be true by default (as the safer option) and overridden by SAI 
and SASI.
   
   And perhaps call it `listRowsInExpiredSSTables`, 
`needsRowsInExpiredSSTables` or something similar?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Reply via email to