Re: [PR] CASSANDRA-18715 Add support for a vector search index in SAI [cassandra]

via GitHub Tue, 17 Oct 2023 11:02:50 -0700


adelapena commented on code in PR #2673:
URL: https://github.com/apache/cassandra/pull/2673#discussion_r1362397268



##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -274,10 +292,54 @@ else if (relation.isLIKE())
                 throw invalidRequest("Non PRIMARY KEY columns found in where 
clause: %s ",
                                      Joiner.on(", 
").join(nonPrimaryKeyColumns));
             }
+
+            var annRestriction = 
Streams.stream(nonPrimaryKeyRestrictions).filter(SingleRestriction::isANN).findFirst();
+            if (annRestriction.isPresent())
+            {
+                // If there is an ANN restriction then it must be for a 
vector<float, n> column, and it must have an index
+                var annColumn = annRestriction.get().getFirstColumn();
+
+                if (!annColumn.type.isVector() || 
!(((VectorType<?>)annColumn.type).elementType instanceof FloatType))
+                    throw 
invalidRequest(StatementRestrictions.ANN_ONLY_SUPPORTED_ON_VECTOR_MESSAGE);
+                if (indexRegistry == null || 
indexRegistry.listIndexes().stream().noneMatch(i -> i.dependsOn(annColumn)))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEX_MESSAGE);
+                // We do not allow ANN queries using partition key 
restrictions that need filtering
+                if (partitionKeyRestrictions.needFiltering(table))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEXED_FILTERING_MESSAGE);
+                // We do not allow ANN query filtering using non-indexed 
columns
+                var nonAnnColumns = 
Streams.stream(nonPrimaryKeyRestrictions).filter(r -> !r.isANN()).map(r -> 
r.getFirstColumn()).collect(Collectors.toList());

Review Comment:
   ```suggestion
                   var nonAnnColumns = 
Streams.stream(nonPrimaryKeyRestrictions).filter(r -> 
!r.isANN()).map(Restriction::getFirstColumn).collect(Collectors.toList());
   ```



##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -274,10 +292,54 @@ else if (relation.isLIKE())
                 throw invalidRequest("Non PRIMARY KEY columns found in where 
clause: %s ",
                                      Joiner.on(", 
").join(nonPrimaryKeyColumns));
             }
+
+            var annRestriction = 
Streams.stream(nonPrimaryKeyRestrictions).filter(SingleRestriction::isANN).findFirst();
+            if (annRestriction.isPresent())
+            {
+                // If there is an ANN restriction then it must be for a 
vector<float, n> column, and it must have an index
+                var annColumn = annRestriction.get().getFirstColumn();
+
+                if (!annColumn.type.isVector() || 
!(((VectorType<?>)annColumn.type).elementType instanceof FloatType))
+                    throw 
invalidRequest(StatementRestrictions.ANN_ONLY_SUPPORTED_ON_VECTOR_MESSAGE);
+                if (indexRegistry == null || 
indexRegistry.listIndexes().stream().noneMatch(i -> i.dependsOn(annColumn)))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEX_MESSAGE);
+                // We do not allow ANN queries using partition key 
restrictions that need filtering
+                if (partitionKeyRestrictions.needFiltering(table))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEXED_FILTERING_MESSAGE);
+                // We do not allow ANN query filtering using non-indexed 
columns
+                var nonAnnColumns = 
Streams.stream(nonPrimaryKeyRestrictions).filter(r -> !r.isANN()).map(r -> 
r.getFirstColumn()).collect(Collectors.toList());
+                var clusteringColumns = 
clusteringColumnsRestrictions.getColumnDefinitions();
+                if (!nonAnnColumns.isEmpty() || !clusteringColumns.isEmpty())
+                {
+                    var nonIndexedColumns = 
Stream.concat(nonAnnColumns.stream(), clusteringColumns.stream())
+                                                  .filter(c -> indexRegistry 
== null || indexRegistry.listIndexes()
+                                                                               
                      .stream()
+                                                                               
                      .noneMatch(i -> i.dependsOn(c)))
+                                                  
.collect(Collectors.toList());
+                    if (!nonIndexedColumns.isEmpty())
+                        throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEXED_FILTERING_MESSAGE);

Review Comment:
   If we have the following schema:
   ```
   CREATE TABLE %s.t (k int, c int, v vector<float, 1>, PRIMARY KEY(k, c))
   CREATE CUSTOM INDEX ON %s.t(v) USING 'StorageAttachedIndex'
   ```
   we can add a non-indexed restriction for the partition key, for example:
   ```
   SELECT * FROM %s.t WHERE k=0 ORDER BY v ANN OF [9] LIMIT 10
   ```
   however, this prevents us from using a clustering prefix, even if that 
doesn't require `ALLOW FILTERING`:
   ```
   SELECT * FROM %s.t WHERE k=0 AND c>2 ORDER BY v ANN OF [9] LIMIT 10
   ```
   Is this correct? Does the additional clustering require post-filtering? 



##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -274,10 +292,54 @@ else if (relation.isLIKE())
                 throw invalidRequest("Non PRIMARY KEY columns found in where 
clause: %s ",
                                      Joiner.on(", 
").join(nonPrimaryKeyColumns));
             }
+
+            var annRestriction = 
Streams.stream(nonPrimaryKeyRestrictions).filter(SingleRestriction::isANN).findFirst();
+            if (annRestriction.isPresent())
+            {
+                // If there is an ANN restriction then it must be for a 
vector<float, n> column, and it must have an index
+                var annColumn = annRestriction.get().getFirstColumn();
+
+                if (!annColumn.type.isVector() || 
!(((VectorType<?>)annColumn.type).elementType instanceof FloatType))
+                    throw 
invalidRequest(StatementRestrictions.ANN_ONLY_SUPPORTED_ON_VECTOR_MESSAGE);
+                if (indexRegistry == null || 
indexRegistry.listIndexes().stream().noneMatch(i -> i.dependsOn(annColumn)))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEX_MESSAGE);
+                // We do not allow ANN queries using partition key 
restrictions that need filtering
+                if (partitionKeyRestrictions.needFiltering(table))
+                    throw 
invalidRequest(StatementRestrictions.ANN_REQUIRES_INDEXED_FILTERING_MESSAGE);
+                // We do not allow ANN query filtering using non-indexed 
columns
+                var nonAnnColumns = 
Streams.stream(nonPrimaryKeyRestrictions).filter(r -> !r.isANN()).map(r -> 
r.getFirstColumn()).collect(Collectors.toList());
+                var clusteringColumns = 
clusteringColumnsRestrictions.getColumnDefinitions();
+                if (!nonAnnColumns.isEmpty() || !clusteringColumns.isEmpty())
+                {
+                    var nonIndexedColumns = 
Stream.concat(nonAnnColumns.stream(), clusteringColumns.stream())
+                                                  .filter(c -> indexRegistry 
== null || indexRegistry.listIndexes()

Review Comment:
   Nit: at this point `indexRegistry` cannot be null (in that case we throw 
`ANN_REQUIRES_INDEX_MESSAGE` a few lines above)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-18715 Add support for a vector search index in SAI [cassandra]

Reply via email to