Re: [PR] CASSANDRA-18715 Add support for a vector search index in SAI [cassandra]

via GitHub Mon, 23 Oct 2023 07:32:50 -0700


mike-tr-adamson commented on code in PR #2673:
URL: https://github.com/apache/cassandra/pull/2673#discussion_r1368779258



##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -274,10 +292,54 @@ else if (relation.isLIKE())
                 throw invalidRequest("Non PRIMARY KEY columns found in where 
clause: %s ",
                                      Joiner.on(", 
").join(nonPrimaryKeyColumns));
             }
+
+            var annRestriction = 
Streams.stream(nonPrimaryKeyRestrictions).filter(SingleRestriction::isANN).findFirst();
+            if (annRestriction.isPresent())
+            {
+                // If there is an ANN restriction then it must be for a 
vector<float, n> column, and it must have an index
+                var annColumn = annRestriction.get().getFirstColumn();
+
+                if (!annColumn.type.isVector() || 
!(((VectorType<?>)annColumn.type).elementType instanceof FloatType))
+                    throw 
invalidRequest(StatementRestrictions.ANN_ONLY_SUPPORTED_ON_VECTOR_MESSAGE);
+                if (indexRegistry == null || 
indexRegistry.listIndexes().stream().noneMatch(i -> i.dependsOn(annColumn)))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEX_MESSAGE);
+                // We do not allow ANN queries using partition key 
restrictions that need filtering
+                if (partitionKeyRestrictions.needFiltering(table))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEXED_FILTERING_MESSAGE);
+                // We do not allow ANN query filtering using non-indexed 
columns
+                var nonAnnColumns = 
Streams.stream(nonPrimaryKeyRestrictions).filter(r -> !r.isANN()).map(r -> 
r.getFirstColumn()).collect(Collectors.toList());
+                var clusteringColumns = 
clusteringColumnsRestrictions.getColumnDefinitions();
+                if (!nonAnnColumns.isEmpty() || !clusteringColumns.isEmpty())
+                {
+                    var nonIndexedColumns = 
Stream.concat(nonAnnColumns.stream(), clusteringColumns.stream())
+                                                  .filter(c -> indexRegistry 
== null || indexRegistry.listIndexes()
+                                                                               
                      .stream()
+                                                                               
                      .noneMatch(i -> i.dependsOn(c)))
+                                                  
.collect(Collectors.toList());
+                    if (!nonIndexedColumns.isEmpty())
+                        throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEXED_FILTERING_MESSAGE);

Review Comment:
   Although it doesn't need `ALLOW FILTERING` and it doesn't get filtered by 
`applyIndexFilter`, it does get filtered earlier by 
`QueryController.doesNotSelect`. This would have the same issues as any other 
non-indexed filtering.
   
   I feel that we should leave this restriction in place for the initial 
release and add it to the list of restrictions to fix later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-18715 Add support for a vector search index in SAI [cassandra]

Reply via email to