[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r58743 ## File path: processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java ## @@ -32,4 +32,15 @@ DruidFloatPredicate makeFloatPredicate(); DruidDoublePredicate makeDoublePredicate(); + + /** + * Object predicate is currently only used by vectorized matchers for non-string object selectors. To preserve + * behavior with non-vectorized matchers which use a string predicate with null inputs for these 'nil' matchers, + * do the same thing here... + */ + default Predicate makeObjectPredicate() Review comment: adjusted javadocs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r587431712 ## File path: processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java ## @@ -62,16 +66,29 @@ default int getMaxVectorSize() SingleValueDimensionVectorSelector makeSingleValueDimensionSelector(DimensionSpec dimensionSpec); /** - * Returns a string-typed, multi-value-per-row column selector. Should only be called on columns where - * {@link #getColumnCapabilities} indicates they return STRING. Unlike {@link #makeSingleValueDimensionSelector}, - * this should not be called on nonexistent columns. + * Returns a dictionary encoded, string-typed, multi-value-per-row column selector. Should only be called on columns + * where {@link #getColumnCapabilities} indicates they return STRING. Unlike + * {@link #makeSingleValueDimensionSelector}, this should not be called on nonexistent columns. * * If you need to write code that adapts to different input types, you should write a * {@link org.apache.druid.segment.VectorColumnProcessorFactory} and use one of the * {@link org.apache.druid.segment.ColumnProcessors#makeVectorProcessor} functions instead of using this method. */ MultiValueDimensionVectorSelector makeMultiValueDimensionSelector(DimensionSpec dimensionSpec); + /** + * Returns a dimension object selector. Useful for non-dictionary encoded strings, or when using the dictionary + * isn't useful. Right now, this should probably only be called on single valued STRING inputs (multi-value STRING Review comment: i got rid of this method, and merged these javadocs into the `makeObjectSelector` javadocs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r587431424 ## File path: processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java ## @@ -177,6 +177,21 @@ public static ColumnCapabilitiesImpl createSimpleNumericColumnCapabilities(Value return builder; } + /** + * Simple, non dictionary encoded string without bitmap index or anything fancy + */ + public static ColumnCapabilitiesImpl createSimpleStringColumnCapabilities() + { +return new ColumnCapabilitiesImpl().setType(ValueType.STRING) + .setHasMultipleValues(false) Review comment: renamed ^ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r587431276 ## File path: processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java ## @@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector decorate(MultiValueDimensionVectorSele throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName()); } + default VectorObjectSelector decorate(VectorObjectSelector selector) Review comment: I've removed the `decorate` method I added in this PR, in favor of removing this functionality from the vectorized engine. Since no `decorate`, no need for a separate "object dimension selector", I've removed that from `VectorColumnSelectorFactory` as well and reverted the signatures changes to processors that were needed to support that :+1: If we do the same thing and ditch `decorate` for `SingleValueDimensionVectorSelector` and `MultiValueDimensionVectorSelector` there is quite a bit of simplification we can do, because we can discard the `DimensionSpec` much earlier and just work with `columnName` like the other selectors instead of going all the way down like it does now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r584199144 ## File path: processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java ## @@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector decorate(MultiValueDimensionVectorSele throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName()); } + default VectorObjectSelector decorate(VectorObjectSelector selector) Review comment: I haven't yet implemented support for multi-value string object selectors, but your question also seems applicable if we were to add array types too. I guess the contents of the array would be dependent on whatever the underlying selector is providing, and it would be the responsibility of the `DimensionSpec` implementation to handle decoration appropriately? I guess we should also probably consider adding methods for the primitive numeric columns as well, since they are not masqueraded as "dimension selectors" in the vectorized engine, (if we ever want to fully support all of the non-vectorized behavior here I think). I don't know, the more I think about this, do we really need `DimensionSpec`, like at all? The stuff it does is all over the place, and it feels like maybe this type of stuff should just be pushed down into either expression virtual columns or something more specialized as appropriate (more generic predicate filtered selectors virtual column?). Then we maybe we can just ditch this stuff entirely in some sunny future instead of implementing vectorized versions of decorate for the various `DimensionSpec` implementations. Or is there an advantage to `DimensionSpec` that I am missing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r579010273 ## File path: core/src/main/java/org/apache/druid/math/expr/IdentifierExpr.java ## @@ -149,13 +150,14 @@ public boolean canVectorize(InputBindingInspector inspector) ExprType inputType = inspector.getType(binding); if (inputType == null) { - // nil column, we can be anything, why not be a double - return new IdentifierVectorProcessor(ExprType.DOUBLE) + // nil column, we can be anything, so be a string because it's the most flexible + // (numbers will be populated with default values in default mode and non-null) + return new IdentifierVectorProcessor(ExprType.STRING) { @Override -public ExprEvalVector evalVector(VectorInputBinding bindings) +public ExprEvalVector evalVector(VectorInputBinding bindings) { - return new ExprEvalDoubleVector(bindings.getDoubleVector(binding), bindings.getNullVector(binding)); + return new ExprEvalStringVector(Arrays.stream(bindings.getObjectVector(binding)).map(x -> (String) x).toArray(String[]::new)); Review comment: oops, i lied, it needs to cast because non-existent column gives back `Object[]` of nulls instead of `String[]` of nulls This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r579003843 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called Review comment: changed comment to suggest `DimensionPredicateFilter` should use equals of predicate factory instead of string form This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r579003158 ## File path: core/src/main/java/org/apache/druid/math/expr/IdentifierExpr.java ## @@ -149,13 +150,14 @@ public boolean canVectorize(InputBindingInspector inspector) ExprType inputType = inspector.getType(binding); if (inputType == null) { - // nil column, we can be anything, why not be a double - return new IdentifierVectorProcessor(ExprType.DOUBLE) + // nil column, we can be anything, so be a string because it's the most flexible + // (numbers will be populated with default values in default mode and non-null) + return new IdentifierVectorProcessor(ExprType.STRING) { @Override -public ExprEvalVector evalVector(VectorInputBinding bindings) +public ExprEvalVector evalVector(VectorInputBinding bindings) { - return new ExprEvalDoubleVector(bindings.getDoubleVector(binding), bindings.getNullVector(binding)); + return new ExprEvalStringVector(Arrays.stream(bindings.getObjectVector(binding)).map(x -> (String) x).toArray(String[]::new)); Review comment: not sure why i had this code like this since in this case it is getting back nulls... it can just call bindings.getObjectVector. changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578985027 ## File path: processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java ## @@ -27,6 +27,13 @@ { Predicate makeStringPredicate(); + default Predicate makeObjectPredicate() + { +// default to try to use string predicate; +final Predicate stringPredicate = makeStringPredicate(); +return o -> stringPredicate.apply((String) o); Review comment: So, I agree that String objects should just use the string predicate, but I also liked the idea of having an object predicate so that we can potentially match array types and possibly complex types. I've modified the default implementation of object predicate to instead mimic the behavior of the `NilVectorValueMatcher` predicate, which supplied null to the string predicate. `VectorValueMatcherColumnProcessorFactory.makeObjectProcessor ` now checks the type, and uses either `StringObjectVectorValueMatcher` which always uses the string predicate, or `ObjectVectorValueMatcher` which calls the object predicate, and so has the behavior of the `NilVectorValueMatcher`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578982727 ## File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java ## @@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector) @Override public boolean canVectorize(InputBindingInspector inspector) { -return inspector.areNumeric(left, right) && inspector.canVectorize(left, right); Review comment: like, I agree here, but `canVectorize` still feels wrong to saddle with type validation, it can explode when creating the processor if the types are nonsense I think what might be best is to introduce a 'validate' method to expressions so that arguments can be checked up front, and maybe explode if the input is madness This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578967958 ## File path: core/src/main/java/org/apache/druid/math/expr/ExprTypeConversion.java ## @@ -20,24 +20,29 @@ package org.apache.druid.math.expr; import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.Pair; import javax.annotation.Nullable; import java.util.List; public class ExprTypeConversion { + public static Pair coerceNull(Expr.InputBindingInspector inspector, Expr left, Expr right) Review comment: i did one better and just removed it. null literals output type is now `null` instead of their type, so that they behave and are handled like missing columns (since null doesn't have a type) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578964740 ## File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java ## @@ -305,17 +305,29 @@ private static ColumnCapabilities getEffectiveCapabilities( originalCapabilities == null || ValueType.COMPLEX.equals(originalCapabilities.getType()); if (type == ValueType.STRING) { - if (!forceSingleValue && effectiveCapabilites.hasMultipleValues().isMaybeTrue()) { -return strategyFactory.makeMultiValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeMultiValueDimensionSelector(dimensionSpec) -); - } else { -return strategyFactory.makeSingleValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeSingleValueDimensionSelector(dimensionSpec) -); + if (!forceSingleValue) { +// if column is not dictionary encoded (and not non-existent or complex), use object selector +if (effectiveCapabilites.isDictionaryEncoded().isFalse() || +effectiveCapabilites.areDictionaryValuesUnique().isFalse() Review comment: I think I've got this covered better now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578964617 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called Review comment: oops, I forgot to rework this comment, it is assuming that it keeps using toString. If we changed that to not use the string for its equality then it would be the toString that isn't necessary, will try to rework This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578961978 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called + // when testing equality of DimensionPredicateFilter, so it's the truly required method, but even that seems + // strange at best. + // Rather than tackle removing the annotation and reworking equality tests for now, will leave this to refactor + // as part of https://github.com/apache/druid/issues/8256 which suggests combining Filter and DimFilter + // implementations, which should clean up some of this mess. + @Override + public int hashCode() + { +return super.hashCode(); + } + + @Override + public boolean equals(Object obj) + { +return super.equals(obj); + } +}; + + +final ExprType outputType = theExpr.getOutputType(factory); + +if (outputType == null) { + // if an expression is vectorizable, but the output type is null, the result will be null (or the default + // value in default mode) because expression is either all null constants or missing columns + + // in sql compatible mode, this means no matches ever, so just use the false matcher: + if (NullHandling.sqlCompatible()) { +return new FalseVectorMatcher(factory.getVectorSizeInspector()); + } + // in default mode, just fallback to using a long matcher since nearly all boolean-ish expressions + // output a long value so it is probably a safe bet? idk, ending up here by using all null-ish things + // in default mode is dancing on the edge of insanity anyway... Review comment: I reworked this to be cooler I think, since a vectorizable expression that produces a 'null' output type means all inputs are null constants or missing columns, so the function is a constant function, we can just evaluate the expression and make a constant matcher of the correct type instead of just going with something that worked in most cases by luck. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578961003 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called + // when testing equality of DimensionPredicateFilter, so it's the truly required method, but even that seems + // strange at best. + // Rather than tackle removing the annotation and reworking equality tests for now, will leave this to refactor + // as part of https://github.com/apache/druid/issues/8256 which suggests combining Filter and DimFilter + // implementations, which should clean up some of this mess. + @Override + public int hashCode() + { +return super.hashCode(); + } + + @Override + public boolean equals(Object obj) + { +return super.equals(obj); + } +}; + + +final ExprType outputType = theExpr.getOutputType(factory); + +if (outputType == null) { + // if an expression is vectorizable, but the output type is null, the result will be null (or the default + // value in default mode) because expression is either all null constants or missing columns + + // in sql compatible mode, this means no matches ever, so just use the false matcher: + if (NullHandling.sqlCompatible()) { +return new FalseVectorMatcher(factory.getVectorSizeInspector()); + } + // in default mode, just fallback to using a long matcher since nearly all boolean-ish expressions + // output a long value so it is probably a safe bet? idk, ending up here by using all null-ish things + // in default mode is dancing on the edge of insanity anyway... + return VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG), + ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) + ).makeMatcher(predicateFactory); +} + +switch (outputType) { + case LONG: +return VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG), +ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) +).makeMatcher(predicateFactory); + case DOUBLE: +return VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE), +ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) +).makeMatcher(predicateFactory); + case STRING: +return VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor( +// using 'numeric' capabilities creator so we are configured to NOT be dictionary encoded, etc + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING), Review comment: added a `createSimpleStringColumnCapabilities` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r578960768 ## File path: processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java ## @@ -57,6 +57,15 @@ default int getMaxVectorSize() */ MultiValueDimensionVectorSelector makeMultiValueDimensionSelector(DimensionSpec dimensionSpec); + /** + * Returns an object selector for string columns, useful for non-dictionary encoded strings, or when Review comment: I think javadocs are maybe cooler now in describing the intent of these methods This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r558121651 ## File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java ## @@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector) @Override public boolean canVectorize(InputBindingInspector inspector) { -return inspector.areNumeric(left, right) && inspector.canVectorize(left, right); Review comment: array types do not currently implement `canVectorize` so will return the default false. We probably should not use `canVectorize` for argument validation and maybe should consider doing this explicitly when we can resolve the input types I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553814648 ## File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java ## @@ -305,17 +305,29 @@ private static ColumnCapabilities getEffectiveCapabilities( originalCapabilities == null || ValueType.COMPLEX.equals(originalCapabilities.getType()); if (type == ValueType.STRING) { - if (!forceSingleValue && effectiveCapabilites.hasMultipleValues().isMaybeTrue()) { -return strategyFactory.makeMultiValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeMultiValueDimensionSelector(dimensionSpec) -); - } else { -return strategyFactory.makeSingleValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeSingleValueDimensionSelector(dimensionSpec) -); + if (!forceSingleValue) { +// if column is not dictionary encoded (and not non-existent or complex), use object selector +if (effectiveCapabilites.isDictionaryEncoded().isFalse() || +effectiveCapabilites.areDictionaryValuesUnique().isFalse() Review comment: by regular string selector do you mean dimension selector? The answer is yes we _could_, but if the dictionary values are not unique, i came to the conclusion that they aren't especially useful since in all the cases I can think of you can't use them as is and you're going to have to lookup the string values anyway (e.g. filter matching and vector group by). I was trying to provide an avenue to avoid the extra steps with expression virtual columns, and allow an object selector to be created directly since the string is the useable part in this case, and the dictionary ids just feel like an obstruction. But maybe this is overly broad and there are cases where we would want it to act like a dictionary encoded column for some reason that I have missed? ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called Review comment: yeah, I was planning on addressing this in a separate PR, the comments are partially to help remind us to do it in case I didn't get to it immediately ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called + // when testing equality of DimensionPredicateFilter, so it's the truly required method, but even
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553847083 ## File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java ## @@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector) @Override public boolean canVectorize(InputBindingInspector inspector) { -return inspector.areNumeric(left, right) && inspector.canVectorize(left, right); Review comment: i _think_ canVectorize will still end up false because array types/functions are not yet vectorizable, but yeah i guess array types would not be well handled here even if they were, though maybe that is the case for non-vectorized expression as well. I'll do a bit of exploration and see what up This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553845596 ## File path: processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java ## @@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector decorate(MultiValueDimensionVectorSele throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName()); } + default VectorObjectSelector decorate(VectorObjectSelector selector) Review comment: This is used with the poorly named `VectorColumnSelectorFactory.makeStringObjectSelector`, which should be renamed `VectorColumnSelectorFactory.makeDimensionObjectSelector` or something, which is for doing dimensiony things with a vector object selector similar to what is done for `SingleValueDimensionVectorSelector` or `MultiValueDimensionVectorSelector`. There are currently no implementations of this, like is the case with the other decorate methods with vector selectors, i was just trying for symmetry here so that the object selector could behave similar to the other string selectors when used in this context This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553840570 ## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java ## @@ -64,7 +65,16 @@ public GroupByVectorColumnSelector makeMultiValueDimensionProcessor( ValueType.STRING == capabilities.getType(), "groupBy dimension processors must be STRING typed" ); -throw new UnsupportedOperationException("Multi-value dimensions not yet implemented for vectorized groupBys"); +throw new UnsupportedOperationException("Multi-value dimensions are not yet implemented for vectorized groupBys"); + } + + @Override + public GroupByVectorColumnSelector makeObjectProcessor( + ColumnCapabilities capabilities, + VectorObjectSelector selector + ) + { +throw new UnsupportedOperationException("Object columns are not yet implemented for vectorized groupBys"); Review comment: yes, I have a related branch that I will open as a follow-up to this PR that adds support for group-by queries on vectorized string expressions, using an object selector and a newly added dictionary building `GroupByVectorColumnSelector` similar to the non-vectorized group by code-path. In that branch this part checks that the column capabilities are string typed, and if so then will make the new dictionary building selector, else throw an exception for being an unsupported type. It seems beneficial to not have to make the string expression vector results have to masquerade as dictionary encoded, to be translated to string values to be re-encoded into a new dictionary, which is part of my reason for pushing for object selectors for the use case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553833318 ## File path: processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java ## @@ -27,6 +27,13 @@ { Predicate makeStringPredicate(); + default Predicate makeObjectPredicate() + { +// default to try to use string predicate; Review comment: currently I don't think it should be possible because only expression filter will make a matcher factory with object matching predicate and only for string column types, but i guess cast exceptions if an object selector of a complex column somehow got in here from future code, see other comment about thinking a bit further about this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553833052 ## File path: processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java ## @@ -27,6 +27,13 @@ { Predicate makeStringPredicate(); + default Predicate makeObjectPredicate() + { +// default to try to use string predicate; +final Predicate stringPredicate = makeStringPredicate(); +return o -> stringPredicate.apply((String) o); Review comment: uh, it is sort of built on hope right now... the hope that it will probably be a string because the only context it will currently happen in is from making vector matcher object processors on string expression columns, but if we opened it up to additional non-primitive number types like array expressions and complex column types then this would need to be reconsidered, or stronger checks in place that the filter supports filtering on the underlying object type. I might need to think about this a bit deeper. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553820619 ## File path: processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java ## @@ -57,6 +57,15 @@ default int getMaxVectorSize() */ MultiValueDimensionVectorSelector makeMultiValueDimensionSelector(DimensionSpec dimensionSpec); + /** + * Returns an object selector for string columns, useful for non-dictionary encoded strings, or when Review comment: hmm, i have no idea why I named this function like this, it should probably be called `makeDimensionObjectSelector`. Its just expected to make a `VectorObjectSelector` from a `DimensionSpec` rather than a raw column name This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553818153 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called + // when testing equality of DimensionPredicateFilter, so it's the truly required method, but even that seems + // strange at best. + // Rather than tackle removing the annotation and reworking equality tests for now, will leave this to refactor + // as part of https://github.com/apache/druid/issues/8256 which suggests combining Filter and DimFilter + // implementations, which should clean up some of this mess. + @Override + public int hashCode() + { +return super.hashCode(); + } + + @Override + public boolean equals(Object obj) + { +return super.equals(obj); + } +}; + + +final ExprType outputType = theExpr.getOutputType(factory); + +if (outputType == null) { + // if an expression is vectorizable, but the output type is null, the result will be null (or the default + // value in default mode) because expression is either all null constants or missing columns + + // in sql compatible mode, this means no matches ever, so just use the false matcher: + if (NullHandling.sqlCompatible()) { +return new FalseVectorMatcher(factory.getVectorSizeInspector()); + } + // in default mode, just fallback to using a long matcher since nearly all boolean-ish expressions + // output a long value so it is probably a safe bet? idk, ending up here by using all null-ish things + // in default mode is dancing on the edge of insanity anyway... + return VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG), + ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) + ).makeMatcher(predicateFactory); +} + +switch (outputType) { + case LONG: +return VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG), +ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) +).makeMatcher(predicateFactory); + case DOUBLE: +return VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE), +ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) +).makeMatcher(predicateFactory); + case STRING: +return VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor( +// using 'numeric' capabilities creator so we are configured to NOT be dictionary encoded, etc + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING), Review comment: it is strange, but on purpose for the reason the comment says since the numeric capabilities defaults are correct in this case. I will make sure all current users are numeric types, and if so then just make these capabilites from scratch because we should be counting on something geared towards numeric behavior in case those change in some unexpected way. Or, if it is used by other non-numbers then rename it to something more appropriate, like `createSimpleSingleValueColumnCapabilites` or similar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553816206 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called Review comment: yeah, I was planning on addressing this in a separate PR, the comments are partially to help remind us to do it in case I didn't get to it immediately This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553814648 ## File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java ## @@ -305,17 +305,29 @@ private static ColumnCapabilities getEffectiveCapabilities( originalCapabilities == null || ValueType.COMPLEX.equals(originalCapabilities.getType()); if (type == ValueType.STRING) { - if (!forceSingleValue && effectiveCapabilites.hasMultipleValues().isMaybeTrue()) { -return strategyFactory.makeMultiValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeMultiValueDimensionSelector(dimensionSpec) -); - } else { -return strategyFactory.makeSingleValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeSingleValueDimensionSelector(dimensionSpec) -); + if (!forceSingleValue) { +// if column is not dictionary encoded (and not non-existent or complex), use object selector +if (effectiveCapabilites.isDictionaryEncoded().isFalse() || +effectiveCapabilites.areDictionaryValuesUnique().isFalse() Review comment: by regular string selector do you mean dimension selector? The answer is yes we _could_, but if the dictionary values are not unique, i came to the conclusion that they aren't especially useful since in all the cases I can think of you can't use them as is and you're going to have to lookup the string values anyway (e.g. filter matching and vector group by). I was trying to provide an avenue to avoid the extra steps with expression virtual columns, and allow an object selector to be created directly since the string is the useable part in this case, and the dictionary ids just feel like an obstruction. But maybe this is overly broad and there are cases where we would want it to act like a dictionary encoded column for some reason that I have missed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org