[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-03-04 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r58743



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##
@@ -32,4 +32,15 @@
   DruidFloatPredicate makeFloatPredicate();
 
   DruidDoublePredicate makeDoublePredicate();
+
+  /**
+   * Object predicate is currently only used by vectorized matchers for 
non-string object selectors. To preserve
+   * behavior with non-vectorized matchers which use a string predicate with 
null inputs for these 'nil' matchers,
+   * do the same thing here...
+   */
+  default Predicate makeObjectPredicate()

Review comment:
   adjusted javadocs





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-03-04 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r587431712



##
File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##
@@ -62,16 +66,29 @@ default int getMaxVectorSize()
   SingleValueDimensionVectorSelector 
makeSingleValueDimensionSelector(DimensionSpec dimensionSpec);
 
   /**
-   * Returns a string-typed, multi-value-per-row column selector. Should only 
be called on columns where
-   * {@link #getColumnCapabilities} indicates they return STRING. Unlike 
{@link #makeSingleValueDimensionSelector},
-   * this should not be called on nonexistent columns.
+   * Returns a dictionary encoded, string-typed, multi-value-per-row column 
selector. Should only be called on columns
+   * where {@link #getColumnCapabilities} indicates they return STRING. Unlike
+   * {@link #makeSingleValueDimensionSelector}, this should not be called on 
nonexistent columns.
*
* If you need to write code that adapts to different input types, you 
should write a
* {@link org.apache.druid.segment.VectorColumnProcessorFactory} and use one 
of the
* {@link org.apache.druid.segment.ColumnProcessors#makeVectorProcessor} 
functions instead of using this method.
*/
   MultiValueDimensionVectorSelector 
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
 
+  /**
+   * Returns a dimension object selector. Useful for non-dictionary encoded 
strings, or when using the dictionary
+   * isn't useful. Right now, this should probably only be called on single 
valued STRING inputs (multi-value STRING

Review comment:
   i got rid of this method, and merged these javadocs into the 
`makeObjectSelector` javadocs





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-03-04 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r587431424



##
File path: 
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java
##
@@ -177,6 +177,21 @@ public static ColumnCapabilitiesImpl 
createSimpleNumericColumnCapabilities(Value
 return builder;
   }
 
+  /**
+   * Simple, non dictionary encoded string without bitmap index or anything 
fancy
+   */
+  public static ColumnCapabilitiesImpl createSimpleStringColumnCapabilities()
+  {
+return new ColumnCapabilitiesImpl().setType(ValueType.STRING)
+   .setHasMultipleValues(false)

Review comment:
   renamed ^





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-03-04 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r587431276



##
File path: 
processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java
##
@@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector 
decorate(MultiValueDimensionVectorSele
 throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName());
   }
 
+  default VectorObjectSelector decorate(VectorObjectSelector selector)

Review comment:
   I've removed the `decorate` method I added in this PR, in favor of 
removing this functionality from the vectorized engine. Since no `decorate`, no 
need for a separate "object dimension selector", I've removed that from 
`VectorColumnSelectorFactory` as well and reverted the signatures changes to 
processors that were needed to support that :+1:
   
   If we do the same thing and ditch `decorate` for 
`SingleValueDimensionVectorSelector` and `MultiValueDimensionVectorSelector` 
there is quite a bit of simplification we can do, because we can discard the 
`DimensionSpec` much earlier and just work with `columnName` like the other 
selectors instead of going all the way down like it does now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-27 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r584199144



##
File path: 
processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java
##
@@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector 
decorate(MultiValueDimensionVectorSele
 throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName());
   }
 
+  default VectorObjectSelector decorate(VectorObjectSelector selector)

Review comment:
   I haven't yet implemented support for multi-value string object 
selectors, but your question also seems applicable if we were to add array 
types too. I guess the contents of the array would be dependent on whatever the 
underlying selector is providing, and it would be the responsibility of the 
`DimensionSpec` implementation to handle decoration appropriately? I guess we 
should also probably consider adding methods for the primitive numeric columns 
as well, since they are not masqueraded as "dimension selectors" in the 
vectorized engine, (if we ever want to fully support all of the non-vectorized 
behavior here I think).
   
   I don't know, the more I think about this, do we really need 
`DimensionSpec`, like at all? The stuff it does is all over the place, and it 
feels like maybe this type of stuff should just be pushed down into either 
expression virtual columns or something more specialized as appropriate (more 
generic predicate filtered selectors virtual column?). Then we maybe we can 
just ditch this stuff entirely in some sunny future instead of implementing 
vectorized versions of decorate for the various `DimensionSpec` 
implementations. Or is there an advantage to `DimensionSpec` that I am missing?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-19 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r579010273



##
File path: core/src/main/java/org/apache/druid/math/expr/IdentifierExpr.java
##
@@ -149,13 +150,14 @@ public boolean canVectorize(InputBindingInspector 
inspector)
 ExprType inputType = inspector.getType(binding);
 
 if (inputType == null) {
-  // nil column, we can be anything, why not be a double
-  return new IdentifierVectorProcessor(ExprType.DOUBLE)
+  // nil column, we can be anything, so be a string because it's the most 
flexible
+  // (numbers will be populated with default values in default mode and 
non-null)
+  return new IdentifierVectorProcessor(ExprType.STRING)
   {
 @Override
-public ExprEvalVector evalVector(VectorInputBinding bindings)
+public ExprEvalVector evalVector(VectorInputBinding bindings)
 {
-  return new ExprEvalDoubleVector(bindings.getDoubleVector(binding), 
bindings.getNullVector(binding));
+  return new 
ExprEvalStringVector(Arrays.stream(bindings.getObjectVector(binding)).map(x -> 
(String) x).toArray(String[]::new));

Review comment:
   oops, i lied, it needs to cast because non-existent column gives back 
`Object[]` of nulls instead of `String[]` of nulls





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-19 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r579003843



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called

Review comment:
   changed comment to suggest `DimensionPredicateFilter` should use equals 
of predicate factory instead of string form





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-19 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r579003158



##
File path: core/src/main/java/org/apache/druid/math/expr/IdentifierExpr.java
##
@@ -149,13 +150,14 @@ public boolean canVectorize(InputBindingInspector 
inspector)
 ExprType inputType = inspector.getType(binding);
 
 if (inputType == null) {
-  // nil column, we can be anything, why not be a double
-  return new IdentifierVectorProcessor(ExprType.DOUBLE)
+  // nil column, we can be anything, so be a string because it's the most 
flexible
+  // (numbers will be populated with default values in default mode and 
non-null)
+  return new IdentifierVectorProcessor(ExprType.STRING)
   {
 @Override
-public ExprEvalVector evalVector(VectorInputBinding bindings)
+public ExprEvalVector evalVector(VectorInputBinding bindings)
 {
-  return new ExprEvalDoubleVector(bindings.getDoubleVector(binding), 
bindings.getNullVector(binding));
+  return new 
ExprEvalStringVector(Arrays.stream(bindings.getObjectVector(binding)).map(x -> 
(String) x).toArray(String[]::new));

Review comment:
   not sure why i had this code like this since in this case it is getting 
back nulls... it can just call bindings.getObjectVector. changed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578985027



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##
@@ -27,6 +27,13 @@
 {
   Predicate makeStringPredicate();
 
+  default Predicate makeObjectPredicate()
+  {
+// default to try to use string predicate;
+final Predicate stringPredicate = makeStringPredicate();
+return o -> stringPredicate.apply((String) o);

Review comment:
   So, I agree that String objects should just use the string predicate, 
but I also liked the idea of having an object predicate so that we can 
potentially match array types and possibly complex types.
   
   I've modified the default implementation of object predicate to instead 
mimic the behavior of the `NilVectorValueMatcher` predicate, which supplied 
null to the string predicate.
   
   `VectorValueMatcherColumnProcessorFactory.makeObjectProcessor ` now checks 
the type, and uses either `StringObjectVectorValueMatcher` which always uses 
the string predicate, or `ObjectVectorValueMatcher` which calls the object 
predicate, and so has the behavior of the `NilVectorValueMatcher`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578982727



##
File path: 
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##
@@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector)
   @Override
   public boolean canVectorize(InputBindingInspector inspector)
   {
-return inspector.areNumeric(left, right) && inspector.canVectorize(left, 
right);

Review comment:
   like, I agree here, but `canVectorize` still feels wrong to saddle with 
type validation, it can explode when creating the processor if the types are 
nonsense
   
   I think what might be best is to introduce a 'validate' method to 
expressions so that arguments can be checked up front, and maybe explode if the 
input is madness





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578967958



##
File path: core/src/main/java/org/apache/druid/math/expr/ExprTypeConversion.java
##
@@ -20,24 +20,29 @@
 package org.apache.druid.math.expr;
 
 import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.Pair;
 
 import javax.annotation.Nullable;
 import java.util.List;
 
 public class ExprTypeConversion
 {
+  public static Pair coerceNull(Expr.InputBindingInspector 
inspector, Expr left, Expr right)

Review comment:
   i did one better and just removed it. null literals output type is now 
`null` instead of their type, so that they behave and are handled like missing 
columns (since null doesn't have a type)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578964740



##
File path: 
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##
@@ -305,17 +305,29 @@ private static ColumnCapabilities 
getEffectiveCapabilities(
 originalCapabilities == null || 
ValueType.COMPLEX.equals(originalCapabilities.getType());
 
 if (type == ValueType.STRING) {
-  if (!forceSingleValue && 
effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
-return strategyFactory.makeMultiValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
-);
-  } else {
-return strategyFactory.makeSingleValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
-);
+  if (!forceSingleValue) {
+// if column is not dictionary encoded (and not non-existent or 
complex), use object selector
+if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+effectiveCapabilites.areDictionaryValuesUnique().isFalse()

Review comment:
   I think I've got this covered better now





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578964617



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called

Review comment:
   oops, I forgot to rework this comment, it is assuming that it keeps 
using toString. If we changed that to not use the string for its equality then 
it would be the toString that isn't necessary, will try to rework





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578961978



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called
+  // when testing equality of DimensionPredicateFilter, so it's the truly 
required method, but even that seems
+  // strange at best.
+  // Rather than tackle removing the annotation and reworking equality 
tests for now, will leave this to refactor
+  // as part of https://github.com/apache/druid/issues/8256 which suggests 
combining Filter and DimFilter
+  // implementations, which should clean up some of this mess.
+  @Override
+  public int hashCode()
+  {
+return super.hashCode();
+  }
+
+  @Override
+  public boolean equals(Object obj)
+  {
+return super.equals(obj);
+  }
+};
+
+
+final ExprType outputType = theExpr.getOutputType(factory);
+
+if (outputType == null) {
+  // if an expression is vectorizable, but the output type is null, the 
result will be null (or the default
+  // value in default mode) because expression is either all null 
constants or missing columns
+
+  // in sql compatible mode, this means no matches ever, so just use the 
false matcher:
+  if (NullHandling.sqlCompatible()) {
+return new FalseVectorMatcher(factory.getVectorSizeInspector());
+  }
+  // in default mode, just fallback to using a long matcher since nearly 
all boolean-ish expressions
+  // output a long value so it is probably a safe bet? idk, ending up here 
by using all null-ish things
+  // in default mode is dancing on the edge of insanity anyway...

Review comment:
   I reworked this to be cooler I think, since a vectorizable expression 
that produces a 'null' output type means all inputs are null constants or 
missing columns, so the function is a constant function, we can just evaluate 
the expression and make a constant matcher of the correct type instead of just 
going with something that worked in most cases by luck.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578961003



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called
+  // when testing equality of DimensionPredicateFilter, so it's the truly 
required method, but even that seems
+  // strange at best.
+  // Rather than tackle removing the annotation and reworking equality 
tests for now, will leave this to refactor
+  // as part of https://github.com/apache/druid/issues/8256 which suggests 
combining Filter and DimFilter
+  // implementations, which should clean up some of this mess.
+  @Override
+  public int hashCode()
+  {
+return super.hashCode();
+  }
+
+  @Override
+  public boolean equals(Object obj)
+  {
+return super.equals(obj);
+  }
+};
+
+
+final ExprType outputType = theExpr.getOutputType(factory);
+
+if (outputType == null) {
+  // if an expression is vectorizable, but the output type is null, the 
result will be null (or the default
+  // value in default mode) because expression is either all null 
constants or missing columns
+
+  // in sql compatible mode, this means no matches ever, so just use the 
false matcher:
+  if (NullHandling.sqlCompatible()) {
+return new FalseVectorMatcher(factory.getVectorSizeInspector());
+  }
+  // in default mode, just fallback to using a long matcher since nearly 
all boolean-ish expressions
+  // output a long value so it is probably a safe bet? idk, ending up here 
by using all null-ish things
+  // in default mode is dancing on the edge of insanity anyway...
+  return 
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+  
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+  ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+  ).makeMatcher(predicateFactory);
+}
+
+switch (outputType) {
+  case LONG:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+).makeMatcher(predicateFactory);
+  case DOUBLE:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE),
+ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+).makeMatcher(predicateFactory);
+  case STRING:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor(
+// using 'numeric' capabilities creator so we are configured to 
NOT be dictionary encoded, etc
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING),

Review comment:
   added a `createSimpleStringColumnCapabilities`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-02-18 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r578960768



##
File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##
@@ -57,6 +57,15 @@ default int getMaxVectorSize()
*/
   MultiValueDimensionVectorSelector 
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
 
+  /**
+   * Returns an object selector for string columns, useful for non-dictionary 
encoded strings, or when

Review comment:
   I think javadocs are maybe cooler now in describing the intent of these 
methods





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-15 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r558121651



##
File path: 
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##
@@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector)
   @Override
   public boolean canVectorize(InputBindingInspector inspector)
   {
-return inspector.areNumeric(left, right) && inspector.canVectorize(left, 
right);

Review comment:
   array types do not currently implement `canVectorize` so will return the 
default false. We probably should not use `canVectorize` for argument 
validation and maybe should consider doing this explicitly when we can resolve 
the input types I think.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-09 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553814648



##
File path: 
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##
@@ -305,17 +305,29 @@ private static ColumnCapabilities 
getEffectiveCapabilities(
 originalCapabilities == null || 
ValueType.COMPLEX.equals(originalCapabilities.getType());
 
 if (type == ValueType.STRING) {
-  if (!forceSingleValue && 
effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
-return strategyFactory.makeMultiValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
-);
-  } else {
-return strategyFactory.makeSingleValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
-);
+  if (!forceSingleValue) {
+// if column is not dictionary encoded (and not non-existent or 
complex), use object selector
+if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+effectiveCapabilites.areDictionaryValuesUnique().isFalse()

Review comment:
   by regular string selector do you mean dimension selector? The answer is 
yes we _could_, but if the dictionary values are not unique, i came to the 
conclusion that they aren't especially useful since in all the cases I can 
think of you can't use them as is and you're going to have to lookup the string 
values anyway (e.g. filter matching and vector group by).
   
   I was trying to provide an avenue to avoid the extra steps with expression 
virtual columns, and allow an object selector to be created directly since the 
string is the useable part in this case, and the dictionary ids just feel like 
an obstruction. But maybe this is overly broad and there are cases where we 
would want it to act like a dictionary encoded column for some reason that I 
have missed?

##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called

Review comment:
   yeah, I was planning on addressing this in a separate PR, the comments 
are partially to help remind us to do it in case I didn't get to it immediately

##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called
+  // when testing equality of DimensionPredicateFilter, so it's the truly 
required method, but even 

[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553847083



##
File path: 
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##
@@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector)
   @Override
   public boolean canVectorize(InputBindingInspector inspector)
   {
-return inspector.areNumeric(left, right) && inspector.canVectorize(left, 
right);

Review comment:
   i _think_ canVectorize will still end up false because array 
types/functions are not yet vectorizable, but yeah i guess array types would 
not be well handled here even if they were, though maybe that is the case for 
non-vectorized expression as well. I'll do a bit of exploration and see what up





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553845596



##
File path: 
processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java
##
@@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector 
decorate(MultiValueDimensionVectorSele
 throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName());
   }
 
+  default VectorObjectSelector decorate(VectorObjectSelector selector)

Review comment:
   This is used with the poorly named 
`VectorColumnSelectorFactory.makeStringObjectSelector`, which should be renamed 
`VectorColumnSelectorFactory.makeDimensionObjectSelector` or something, which 
is for doing dimensiony things with a vector object selector similar to what is 
done for `SingleValueDimensionVectorSelector` or 
`MultiValueDimensionVectorSelector`. There are currently no implementations of 
this, like is the case with the other decorate methods with vector selectors, i 
was just trying for symmetry here so that the object selector could behave 
similar to the other string selectors when used in this context





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553840570



##
File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java
##
@@ -64,7 +65,16 @@ public GroupByVectorColumnSelector 
makeMultiValueDimensionProcessor(
 ValueType.STRING == capabilities.getType(),
 "groupBy dimension processors must be STRING typed"
 );
-throw new UnsupportedOperationException("Multi-value dimensions not yet 
implemented for vectorized groupBys");
+throw new UnsupportedOperationException("Multi-value dimensions are not 
yet implemented for vectorized groupBys");
+  }
+
+  @Override
+  public GroupByVectorColumnSelector makeObjectProcessor(
+  ColumnCapabilities capabilities,
+  VectorObjectSelector selector
+  )
+  {
+throw new UnsupportedOperationException("Object columns are not yet 
implemented for vectorized groupBys");

Review comment:
   yes, I have a related branch that I will open as a follow-up to this PR 
that adds support for group-by queries on vectorized string expressions, using 
an object selector and a newly added dictionary building 
`GroupByVectorColumnSelector` similar to the non-vectorized group by code-path. 
In that branch this part checks that the column capabilities are string typed, 
and if so then will make the new dictionary building selector, else throw an 
exception for being an unsupported type.
   
   It seems beneficial to not have to make the string expression vector results 
have to masquerade as dictionary encoded, to be translated to string values to 
be re-encoded into a new dictionary, which is part of my reason for pushing for 
object selectors for the use case





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553833318



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##
@@ -27,6 +27,13 @@
 {
   Predicate makeStringPredicate();
 
+  default Predicate makeObjectPredicate()
+  {
+// default to try to use string predicate;

Review comment:
   currently I don't think it should be possible because only expression 
filter will make a matcher factory with object matching predicate and only for 
string column types, but i guess cast exceptions if an object selector of a 
complex column somehow got in here from future code, see other comment about 
thinking a bit further about this  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553833052



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##
@@ -27,6 +27,13 @@
 {
   Predicate makeStringPredicate();
 
+  default Predicate makeObjectPredicate()
+  {
+// default to try to use string predicate;
+final Predicate stringPredicate = makeStringPredicate();
+return o -> stringPredicate.apply((String) o);

Review comment:
   uh, it is sort of built on hope right now... the hope that it will 
probably be a string because the only context it will currently happen in is 
from making vector matcher object processors on string expression columns, but 
if we opened it up to additional non-primitive number types like array 
expressions and complex column types then this would need to be reconsidered, 
or stronger checks in place that the filter supports filtering on the 
underlying object type. I might need to think about this a bit deeper.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553820619



##
File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##
@@ -57,6 +57,15 @@ default int getMaxVectorSize()
*/
   MultiValueDimensionVectorSelector 
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
 
+  /**
+   * Returns an object selector for string columns, useful for non-dictionary 
encoded strings, or when

Review comment:
   hmm, i have no idea why I named this function like this, it should 
probably be called `makeDimensionObjectSelector`. Its just expected to make a 
`VectorObjectSelector` from a `DimensionSpec` rather than a raw column name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553818153



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called
+  // when testing equality of DimensionPredicateFilter, so it's the truly 
required method, but even that seems
+  // strange at best.
+  // Rather than tackle removing the annotation and reworking equality 
tests for now, will leave this to refactor
+  // as part of https://github.com/apache/druid/issues/8256 which suggests 
combining Filter and DimFilter
+  // implementations, which should clean up some of this mess.
+  @Override
+  public int hashCode()
+  {
+return super.hashCode();
+  }
+
+  @Override
+  public boolean equals(Object obj)
+  {
+return super.equals(obj);
+  }
+};
+
+
+final ExprType outputType = theExpr.getOutputType(factory);
+
+if (outputType == null) {
+  // if an expression is vectorizable, but the output type is null, the 
result will be null (or the default
+  // value in default mode) because expression is either all null 
constants or missing columns
+
+  // in sql compatible mode, this means no matches ever, so just use the 
false matcher:
+  if (NullHandling.sqlCompatible()) {
+return new FalseVectorMatcher(factory.getVectorSizeInspector());
+  }
+  // in default mode, just fallback to using a long matcher since nearly 
all boolean-ish expressions
+  // output a long value so it is probably a safe bet? idk, ending up here 
by using all null-ish things
+  // in default mode is dancing on the edge of insanity anyway...
+  return 
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+  
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+  ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+  ).makeMatcher(predicateFactory);
+}
+
+switch (outputType) {
+  case LONG:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+).makeMatcher(predicateFactory);
+  case DOUBLE:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE),
+ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+).makeMatcher(predicateFactory);
+  case STRING:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor(
+// using 'numeric' capabilities creator so we are configured to 
NOT be dictionary encoded, etc
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING),

Review comment:
   it is strange, but on purpose for the reason the comment says since the 
numeric capabilities defaults are correct in this case. I will make sure all 
current users are numeric types, and if so then just make these capabilites 
from scratch because we should be counting on something geared towards numeric 
behavior in case those change in some unexpected way. Or, if it is used by 
other non-numbers then rename it to something more appropriate, like 
`createSimpleSingleValueColumnCapabilites` or similar





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to 

[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553816206



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called

Review comment:
   yeah, I was planning on addressing this in a separate PR, the comments 
are partially to help remind us to do it in case I didn't get to it immediately





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553814648



##
File path: 
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##
@@ -305,17 +305,29 @@ private static ColumnCapabilities 
getEffectiveCapabilities(
 originalCapabilities == null || 
ValueType.COMPLEX.equals(originalCapabilities.getType());
 
 if (type == ValueType.STRING) {
-  if (!forceSingleValue && 
effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
-return strategyFactory.makeMultiValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
-);
-  } else {
-return strategyFactory.makeSingleValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
-);
+  if (!forceSingleValue) {
+// if column is not dictionary encoded (and not non-existent or 
complex), use object selector
+if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+effectiveCapabilites.areDictionaryValuesUnique().isFalse()

Review comment:
   by regular string selector do you mean dimension selector? The answer is 
yes we _could_, but if the dictionary values are not unique, i came to the 
conclusion that they aren't especially useful since in all the cases I can 
think of you can't use them as is and you're going to have to lookup the string 
values anyway (e.g. filter matching and vector group by).
   
   I was trying to provide an avenue to avoid the extra steps with expression 
virtual columns, and allow an object selector to be created directly since the 
string is the useable part in this case, and the dictionary ids just feel like 
an obstruction. But maybe this is overly broad and there are cases where we 
would want it to act like a dictionary encoded column for some reason that I 
have missed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org