Re: [PR] CASSANDRA-19497 ResultRetriever should batch clusterings/rows during SAI post-filtering reads [cassandra]

via GitHub Thu, 07 Nov 2024 13:53:56 -0800


maedhroz commented on code in PR #3649:
URL: https://github.com/apache/cassandra/pull/3649#discussion_r1833407074



##########
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java:
##########
@@ -400,36 +430,43 @@ private UnfilteredRowIterator applyIndexFilter(PrimaryKey 
key, UnfilteredRowIter
                 {
                     queryContext.rowsFiltered++;
 
-                    if (tree.isSatisfiedBy(partition.partitionKey(), (Row) 
unfiltered, staticRow))
+                    if (tree.isSatisfiedBy(partitionKey, (Row) unfiltered, 
staticRow))
                     {
-                        matchingRows.add(unfiltered);
+                        matches.add(unfiltered);
                         hasMatch = true;
+
+                        if (topK)
+                        {
+                            PrimaryKey shadowed = 
keyFactory.hasClusteringColumns()
+                                                  ? 
keyFactory.create(partitionKey, ((Row) unfiltered).clustering())
+                                                  : 
keyFactory.create(partitionKey);
+                            keysToShadow.remove(shadowed);
+                        }
                     }
                 }
             }
 
-            if (!hasMatch)
+            if (topK)
             {
-                queryContext.rowsFiltered++;
+                // If any rows match the filter, there should be no need to 
shadow the static primary key:
+                if (hasMatch && keyFactory.hasClusteringColumns())
+                    keysToShadow.remove(keyFactory.create(partitionKey, 
Clustering.STATIC_CLUSTERING));
 
-                if (tree.isSatisfiedBy(key.partitionKey(), staticRow, 
staticRow))
-                    hasMatch = true;
+                // Record primary keys shadowed by expired TTLs, row 
tombstones, or range tombstones:
+                if (!keysToShadow.isEmpty())
+                    
queryContext.vectorContext().recordShadowedPrimaryKeys(keysToShadow);

Review Comment:
   @jasonstack It looks like `VectorUpdateDeleteTest` does touch every codepath 
in this method. Indeed, it did break down a couple times during my refactor. 
What I've done here is loosely inspired by your suggestion, albeit with some 
protections to avoid allocations in the non-topK case.
   
   I've also removed what really does look like an unnecessary second round of 
`isSatisfiedBy()` calls using just the static row. Before CASSANDRA-19034, this 
actually happened before the `while` loop that filters the rows, but it was 
never strictly necessary. I suppose I'll see if CI disagrees, but I think this 
make sense.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19497 ResultRetriever should batch clusterings/rows during SAI post-filtering reads [cassandra]

Reply via email to