adelapena commented on code in PR #2673:
URL: https://github.com/apache/cassandra/pull/2673#discussion_r1327183445
##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -276,8 +290,25 @@ else if (relation.isLIKE())
}
if (hasQueriableIndex)
usesSecondaryIndexing = true;
- else if (!allowFiltering && requiresAllowFilteringIfNotSpecified())
- throw invalidRequest(allowFilteringMessage(state));
+ else
+ {
+ if (nonPrimaryKeyRestrictions.hasAnn())
+ {
+ var vectorColumn =
nonPrimaryKeyRestrictions.getColumnDefs().stream().filter(c ->
c.type.isVector()).findFirst();
Review Comment:
Can't we have multiple vector columns, so the first vector column is not the
indexed one? Do we have any tests covering this?
##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -276,8 +290,25 @@ else if (relation.isLIKE())
}
if (hasQueriableIndex)
usesSecondaryIndexing = true;
- else if (!allowFiltering && requiresAllowFilteringIfNotSpecified())
- throw invalidRequest(allowFilteringMessage(state));
+ else
Review Comment:
Can we add braces to the `if` block above, so it matches the braced `else`
block?
##########
src/java/org/apache/cassandra/db/ReadQuery.java:
##########
@@ -256,4 +256,12 @@ default void maybeValidateIndex()
default void trackWarnings()
{
}
+
+ /**
+ * @return true given read query is a top-k request
Review Comment:
```suggestion
* @return {@code true} if this is a top-k query
```
##########
src/java/org/apache/cassandra/db/ColumnFamilyStore.java:
##########
@@ -1456,6 +1457,12 @@ public void apply(PartitionUpdate update,
CassandraWriteContext context, boolean
}
catch (RuntimeException e)
{
+ if (e instanceof InvalidRequestException)
+ {
+ throw new InvalidRequestException(e.getMessage()
Review Comment:
The message construction is identical for `InvalidRequestException` and
`RuntimeException`. We could use:
```java
catch (RuntimeException e)
{
String msg = e.getMessage() + " for ks: " + getKeyspaceName() + ",
table: " + name;
if (e instanceof InvalidRequestException)
throw new InvalidRequestException(msg, e);
throw new RuntimeException(msg, e);
}
```
##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -476,6 +507,38 @@ public boolean usesSecondaryIndexing()
return this.usesSecondaryIndexing;
}
+ /**
+ * This is a hack to push ordering down to indexes.
+ * Indexes are selected based on RowFilter only, so we need to turn
orderings into restrictions
+ * so they end up in the row filter.
+ *
+ * @param orderings orderings from the select statement
+ * @return the {@link RestrictionSet} with the added orderings
+ */
+ private RestrictionSet addOrderingRestrictions(List<Ordering> orderings,
RestrictionSet restrictionSet)
+ {
+ List<Ordering> annOrderings = orderings.stream().filter(o ->
o.expression.hasNonClusteredOrdering()).collect(Collectors.toList());
+
+ if (annOrderings.size() > 1)
+ throw new InvalidRequestException("Cannot specify more than one
ANN ordering");
+ else if (annOrderings.size() == 1)
+ {
+ if (orderings.size() > 1)
+ throw new InvalidRequestException("ANN ordering does not
support secondary ordering");
Review Comment:
I would say something like `ANN ordering does not support any other
ordering`, since ANN could be considered the secondary ordering.
##########
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java:
##########
@@ -1573,6 +1656,39 @@ public int compare(List<ByteBuffer> a, List<ByteBuffer>
b)
}
}
+ private static class IndexColumnComparator<T> extends
ColumnComparator<List<ByteBuffer>>
Review Comment:
The generic (`<T>`) doesn't seem to be used.
##########
src/java/org/apache/cassandra/db/marshal/VectorType.java:
##########
@@ -365,6 +366,7 @@ public abstract class VectorSerializer extends
TypeSerializer<List<T>>
public abstract <V> List<V> split(V buffer, ValueAccessor<V> accessor);
public abstract <V> V serializeRaw(List<V> elements, ValueAccessor<V>
accessor);
+ public abstract float[] deserializeFloatArray(ByteBuffer input);
Review Comment:
Does this method duplicate `composeAsFloat`?
##########
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java:
##########
@@ -276,8 +290,25 @@ else if (relation.isLIKE())
}
if (hasQueriableIndex)
usesSecondaryIndexing = true;
- else if (!allowFiltering && requiresAllowFilteringIfNotSpecified())
- throw invalidRequest(allowFilteringMessage(state));
+ else
+ {
+ if (nonPrimaryKeyRestrictions.hasAnn())
+ {
+ var vectorColumn =
nonPrimaryKeyRestrictions.getColumnDefs().stream().filter(c ->
c.type.isVector()).findFirst();
+ if (vectorColumn.isPresent())
+ {
+ var vc = vectorColumn.get();
+ var hasIndex =
indexRegistry.listIndexes().stream().anyMatch(i -> i.dependsOn(vc));
Review Comment:
Can we add an assertion verifying that `indexRegistry` is not null to
prevent the warning?
##########
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java:
##########
@@ -822,7 +825,7 @@ private DataLimits getDataLimits(int userLimit,
// If we do post ordering we need to get all the results sorted before
we can trim them.
if (aggregationSpec != AggregationSpecification.AGGREGATE_EVERYTHING)
{
- if (!needsPostQueryOrdering())
+ if (!needsToSkipUserLimit())
Review Comment:
I think it's the only usage of `needsToSkipUserLimit`, so it can probably be
inlined here.
##########
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java:
##########
@@ -288,7 +291,7 @@ public ResultMessage.Rows execute(QueryState state,
QueryOptions options, long q
query.trackWarnings();
ResultMessage.Rows rows;
- if (aggregationSpec == null && (pageSize <= 0 ||
(query.limits().count() <= pageSize)))
+ if (aggregationSpec == null && (pageSize <= 0 ||
(query.limits().count() <= pageSize) || query.isTopK()))
Review Comment:
`StorageAttachedIndex#validate` makes sure that we cannot use ANN without a
`LIMIT`. However, paging is disabled in the CQL layer, so any index
implementation could provide support for ANN. I think that the existence of a
`LIMIT` should be checked here in `SelectStatement`.
As for the specific limit, 100 by default, I guess it could also be checked
on the generic CQL layer because it affects the size of the pages. Should it
also be checked by the specific index implementation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]