Re: [PR] CASSANDRA-19018 An SAI-specific mechanism to ensure consistency isn't violated for multi-column AND queries at CL > ONE [cassandra]

via GitHub Tue, 20 Feb 2024 07:56:24 -0800


adelapena commented on code in PR #2935:
URL: https://github.com/apache/cassandra/pull/2935#discussion_r1495963991



##########
src/java/org/apache/cassandra/db/filter/RowFilter.java:
##########
@@ -111,14 +133,33 @@ private void add(Expression expression)
         expressions.add(expression);
     }
 
-    public void addUserExpression(UserExpression e)
+    public List<Expression> getExpressions()
     {
-        expressions.add(e);
+        return expressions;
     }
 
-    public List<Expression> getExpressions()
+    /**
+     * @return true if this filter belongs to a read that requires 
reconciliation at the coordinator
+     * @see StatementRestrictions#getRowFilter(IndexRegistry, QueryOptions)
+     */
+    public boolean needsReconciliation()
     {
-        return expressions;
+        return needsReconciliation;
+    }
+
+    /**
+     * If this filter belongs to a read that requires reconciliation at the 
coordinator, and it contains an intersection
+     * on two or more non-key (and therefore mutable) columns, we cannot 
strictly apply it to local, unrepaired rows.
+     * When this occurs, we must downgrade the intersection to a union and 
allow the coordinator to filter strictly 
+     * before sending results to the client.
+     * 

Review Comment:
   ```suggestion
        * When this occurs, we must downgrade the intersection of expressions 
to a union and allow the coordinator to 
        * filter strictly before sending results to the client.
        * 
   ```



##########
src/java/org/apache/cassandra/db/ConsistencyLevel.java:
##########
@@ -258,6 +258,11 @@ public void validateCounterForWrite(TableMetadata 
metadata) throws InvalidReques
             throw new InvalidRequestException("Counter operations are 
inherently non-serializable");
     }
 
+    public boolean needsReconciliation()

Review Comment:
   Nit: this could use some JavaDoc



##########
src/java/org/apache/cassandra/db/compaction/CompactionIterator.java:
##########
@@ -269,12 +269,11 @@ public void onMergedPartitionLevelDeletion(DeletionTime 
mergedDeletion, Deletion
                     {
                     }
 
-                    public Row onMergedRows(Row merged, Row[] versions)
+                    public void onMergedRows(Row merged, Row[] versions)

Review Comment:
   Nit: add `@Override`



##########
src/java/org/apache/cassandra/db/filter/RowFilter.java:
##########
@@ -60,32 +83,31 @@
  * be handled by a 2ndary index, and the rest is simply filtered out from the
  * result set (the later can only happen if the query was using ALLOW 
FILTERING).
  */
-public abstract class RowFilter implements Iterable<RowFilter.Expression>
+public class RowFilter implements Iterable<RowFilter.Expression>
 {
     private static final Logger logger = 
LoggerFactory.getLogger(RowFilter.class);
 
     public static final Serializer serializer = new Serializer();
+    private static final RowFilter NONE = new 
RowFilter(Collections.emptyList(), false);
 
     protected final List<Expression> expressions;
 
-    protected RowFilter(List<Expression> expressions)
-    {
-        this.expressions = expressions;
-    }
+    private final boolean needsReconciliation;
 
-    public static RowFilter create()
+    protected RowFilter(List<Expression> expressions, boolean 
needsReconciliation)
     {
-        return new CQLFilter(new ArrayList<>());
+        this.expressions = expressions;
+        this.needsReconciliation = needsReconciliation;
     }
 
-    public static RowFilter create(int capacity)
+    public static RowFilter create(boolean needsReconciliation)

Review Comment:
   Nit: This constructor could use some JavaDoc about `needsReconciliation`.



##########
src/java/org/apache/cassandra/index/sai/QueryContext.java:
##########
@@ -58,6 +58,8 @@ public class QueryContext
 
     public boolean queryTimedOut = false;
 
+    public boolean hasUnrepairedMatches = false;

Review Comment:
   Nit: add JavaDoc



##########
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexQueryPlan.java:
##########
@@ -71,18 +75,24 @@ public static StorageAttachedIndexQueryPlan 
create(ColumnFamilyStore cfs,
 
         for (RowFilter.Expression expression : rowFilter)
         {
-            // we ignore any expressions here (currently IN and user-defined 
expressions) where we don't have a way to
-            // translate their #isSatifiedBy method, they will be included in 
the filter returned by QueryPlan#postIndexQueryFilter()
+            // We ignore any expressions here (currently IN and user-defined 
expressions) where we don't have a way to
+            // translate their #isSatifiedBy method, they will be included in 
the filter returned by 
+            // QueryPlan#postIndexQueryFilter(). If strict filtering is not 
allowed, we must reject the query until the
+            // expression(s) in question are compatible with #isSatifiedBy.
             //
             // Note: For both the pre- and post-filters we need to check that 
the expression exists before removing it
             // because the without method assert if the expression doesn't 
exist. This can be the case if we are given
             // a duplicate expression - a = 1 and a = 1. The without method 
removes all instances of the expression.
             if (expression.operator().isIN() || expression.isUserDefined())
             {
+                if (!rowFilter.isStrict())
+                    throw new 
InvalidRequestException(String.format(UNSUPPORTED_NON_STRICT_OPERATOR, 
expression.operator()));

Review Comment:
   I think this error could be thrown by 
`StorageAttachedIndex#validate(ReadCommand)`.



##########
src/java/org/apache/cassandra/db/filter/RowFilter.java:
##########
@@ -136,7 +177,66 @@ public boolean hasExpressionOnClusteringOrRegularColumns()
         return false;
     }
 
-    protected abstract Transformation<BaseRowIterator<?>> filter(TableMetadata 
metadata, long nowInSec);
+    protected Transformation<BaseRowIterator<?>> filter(TableMetadata 
metadata, long nowInSec)
+    {
+        List<Expression> partitionLevelExpressions = new ArrayList<>();
+        List<Expression> rowLevelExpressions = new ArrayList<>();
+        for (Expression e: expressions)
+        {
+            if (e.column.isStatic() || e.column.isPartitionKey())
+                partitionLevelExpressions.add(e);
+            else
+                rowLevelExpressions.add(e);
+        }
+
+        long numberOfRegularColumnExpressions = rowLevelExpressions.size();
+        final boolean filterNonStaticColumns = 
numberOfRegularColumnExpressions > 0;
+
+        return new Transformation<BaseRowIterator<?>>()

Review Comment:
   ```suggestion
           return new Transformation<>()
   ```



##########
src/java/org/apache/cassandra/db/filter/RowFilter.java:
##########
@@ -136,7 +177,66 @@ public boolean hasExpressionOnClusteringOrRegularColumns()
         return false;
     }
 
-    protected abstract Transformation<BaseRowIterator<?>> filter(TableMetadata 
metadata, long nowInSec);
+    protected Transformation<BaseRowIterator<?>> filter(TableMetadata 
metadata, long nowInSec)

Review Comment:
   It might be worth to comment here and/or in `isStrict()` that the multiple 
`filter` and `isSatisfiedBy` methods always use conjunction. Otherwise, one 
might wrongly understand that a non-strict filter would use disjunction in 
those methods.



##########
src/java/org/apache/cassandra/index/sai/QueryContext.java:
##########
@@ -87,4 +89,9 @@ public VectorQueryContext vectorContext()
             vectorContext = new VectorQueryContext(readCommand);
         return vectorContext;
     }
+    
+    public boolean hasUnrepairedMatches()
+    {
+        return hasUnrepairedMatches;
+    }

Review Comment:
   This getter seems redundant given that the attribute is `public`.



##########
src/java/org/apache/cassandra/index/sai/disk/v1/MemtableIndexWriter.java:
##########
@@ -83,13 +89,31 @@ public void addRow(PrimaryKey key, Row row, long 
sstableRowId)
     {
         // Memtable indexes are flushed directly to disk with the aid of a 
mapping between primary
         // keys and row IDs in the flushing SSTable. This writer, therefore, 
does nothing in
-        // response to the flushing of individual rows.
+        // response to the flushing of individual rows except for keeping 
index-specific statistics.
+        boolean isStatic = indexTermType.columnMetadata().isStatic();
+
+        // Indexes on static columns should only track static rows, and 
indexes on non-static columns 
+        // should only track non-static rows. (Within a partition, the row ID 
for a static row will always
+        // come before any non-static row.) 
+        if (key.kind() == PrimaryKey.Kind.STATIC && isStatic || key.kind() != 
PrimaryKey.Kind.STATIC && !isStatic)
+        {
+            if (minKey == null)
+                minKey = key;
+            maxKey = key;

Review Comment:
   This way of updating min/max keys is based on the expectation that 
`PerColumnIndexWriter#addRow` is always called in primary key order. Maybe we 
can mention that in the JavaDoc for `PerColumnIndexWriter`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19018 An SAI-specific mechanism to ensure consistency isn't violated for multi-column AND queries at CL > ONE [cassandra]

Reply via email to