Re: [PR] CASSANDRA-20086: Use score ordered iterators for ANN search [cassandra]

via GitHub Mon, 12 Jan 2026 13:45:47 -0800


michaeljmarshall commented on code in PR #4353:
URL: https://github.com/apache/cassandra/pull/4353#discussion_r2684014172



##########
src/java/org/apache/cassandra/index/sai/disk/format/IndexDescriptor.java:
##########
@@ -126,7 +127,10 @@ public PrimaryKeyMap.Factory 
newPrimaryKeyMapFactory(SSTableReader sstable)
 
     public SSTableIndex newSSTableIndex(SSTableContext sstableContext, 
StorageAttachedIndex index)
     {
-        return version.onDiskFormat().newSSTableIndex(sstableContext, index);
+        if (isIndexEmpty(index.termType(), index.identifier()))
+            return index.termType().isVector() ? new 
EmptyIndex(sstableContext, index) : null;

Review Comment:
   I pushed a commit to remove the branching. I added a detailed commit message 
explaining the choice. tl;dr: removing the branching means we have slightly 
higher memory usage, but not by much.



##########
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java:
##########
@@ -120,24 +132,45 @@ public PartitionIterator 
filterReplicaFilteringProtection(PartitionIterator full
     public UnfilteredPartitionIterator search(ReadExecutionController 
executionController) throws RequestTimeoutException
     {
         if (!command.isTopK())
+        {
             return new ResultRetriever(executionController, false);
+        }
         else
         {
-            Supplier<ResultRetriever> resultSupplier = () -> new 
ResultRetriever(executionController, true);
-
-            // VSTODO performance: if there is shadowed primary keys, we have 
to at least query twice.
-            //  First time to find out there are shadow keys, second time to 
find out there are no more shadow keys.
-            while (true)
+            // Need a consistent view of the memtables/sstables and their 
associated index, so we get the view now
+            // and propagate it as needed.
+            QueryViewBuilder.QueryView queryView = buildAnnQueryView();
+            try
+            {
+                queryController.maybeTriggerGuardrails(queryView);
+                ScoreOrderedResultRetriever result = new 
ScoreOrderedResultRetriever(queryController, executionController, queryContext, 
queryView, command.limits().count());
+                return (UnfilteredPartitionIterator) new 
VectorTopKProcessor(command).takeTopKThenSortByPrimaryKey(result);
+            }
+            finally
             {
-                long lastShadowedKeysCount = 
queryContext.vectorContext().getShadowedPrimaryKeys().size();
-                ResultRetriever result = resultSupplier.get();
-                UnfilteredPartitionIterator topK = 
(UnfilteredPartitionIterator) new VectorTopKProcessor(command).filter(result);
+                
queryView.referencedIndexes.forEach(SSTableIndex::releaseQuietly);

Review Comment:
   This is implemented now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-20086: Use score ordered iterators for ANN search [cassandra]

Reply via email to