Re: [PR] CASSANDRA-18715 Add support for a vector search index in SAI [cassandra]

via GitHub Tue, 17 Oct 2023 11:08:23 -0700


adelapena commented on code in PR #2673:
URL: https://github.com/apache/cassandra/pull/2673#discussion_r1362557283



##########
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java:
##########
@@ -1080,19 +1139,26 @@ public void processPartition(RowIterator partition, 
QueryOptions options, Result
 
     private boolean needsPostQueryOrdering()
     {
-        // We need post-query ordering only for queries with IN on the 
partition key and an ORDER BY.
-        return restrictions.keyIsInRelation() && 
!parameters.orderings.isEmpty();
+        // We need post-query ordering only for queries with IN on the 
partition key and an ORDER BY or index restriction reordering
+        return restrictions.keyIsInRelation() && 
!parameters.orderings.isEmpty() || needIndexOrdering();
+    }
+
+    private boolean needIndexOrdering()
+    {
+        return orderingComparator != null && 
orderingComparator.indexOrdering();
     }
 
     /**
      * Orders results when multiple keys are selected (using IN)
      */
-    private void orderResults(ResultSet cqlRows)
+    private void orderResults(ResultSet cqlRows, QueryOptions options)
     {
         if (cqlRows.size() == 0 || !needsPostQueryOrdering())
             return;
 
-        Collections.sort(cqlRows.rows, orderingComparator);
+        Comparator<List<ByteBuffer>> comparator = 
orderingComparator.prepareFor(table, getRowFilter(options), options);
+        if (comparator != null)
+            Collections.sort(cqlRows.rows, comparator);

Review Comment:
   Actually, the order isn't necessarily first by the indexed column, then by 
primary key. The rows gotten from the ANN index don't have that order, and the 
primary key order is applied after applying the query limit, I think. So we can 
have surprising cases like this one:
   ```java
   @Test
   public void sortLimitTest()
   {
       cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int, c int, v 
vector<float, 1>, PRIMARY KEY(k, c))"));
       cluster.schemaChange(withKeyspace("CREATE CUSTOM INDEX ON %s.t(v) USING 
'StorageAttachedIndex' WITH OPTIONS = {'similarity_function' : 'euclidean'}"));
   
       ICoordinator coordinator = cluster.coordinator(1);
       coordinator.execute(withKeyspace("INSERT INTO %s.t (k, c, v) VALUES (0, 
1, [1])"), ConsistencyLevel.ALL);
       coordinator.execute(withKeyspace("INSERT INTO %s.t (k, c, v) VALUES (0, 
2, [1])"), ConsistencyLevel.ALL);
       coordinator.execute(withKeyspace("INSERT INTO %s.t (k, c, v) VALUES (0, 
3, [1])"), ConsistencyLevel.ALL);
   
       Object[][] rows = coordinator.execute(withKeyspace("SELECT * FROM %s.t 
ORDER BY v ANN OF [2] LIMIT 10"), ConsistencyLevel.ONE);
       assertRows(rows,
                  row(0, 1, CQLTester.vector(1f)),
                  row(0, 2, CQLTester.vector(1f)),
                  row(0, 3, CQLTester.vector(1f)));
   
       rows = coordinator.execute(withKeyspace("SELECT * FROM %s.t ORDER BY v 
ANN OF [2] LIMIT 1"), ConsistencyLevel.ONE);
       assertRows(rows,
                  row(0, 1, CQLTester.vector(1f))); // returns [0, 3, 1] 
instead of [0, 1, 1]!!!
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-18715 Add support for a vector search index in SAI [cassandra]

Reply via email to