[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-06-03 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r434745199



##
File path: 
processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.expression;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+
+import javax.annotation.Nonnull;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro
+{
+  private static final String FN_NAME = "regexp_like";
+
+  @Override
+  public String name()
+  {
+return FN_NAME;
+  }
+
+  @Override
+  public Expr apply(final List args)
+  {
+if (args.size() != 2) {
+  throw new IAE("Function[%s] must have 2 arguments", name());
+}
+
+final Expr arg = args.get(0);
+final Expr patternExpr = args.get(1);
+
+if (!ExprUtils.isStringLiteral(patternExpr)) {
+  throw new IAE("Function[%s] pattern must be a string literal", name());
+}
+
+// Precompile the pattern.
+final Pattern pattern = Pattern.compile(
+StringUtils.nullToEmptyNonDruidDataString((String) 
patternExpr.getLiteralValue())
+);
+
+class RegexpLikeExpr extends 
ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr
+{
+  private RegexpLikeExpr(Expr arg)
+  {
+super(FN_NAME, arg);
+  }
+
+  @Nonnull
+  @Override
+  public ExprEval eval(final ObjectBinding bindings)
+  {
+final String s = 
NullHandling.nullToEmptyIfNeeded(arg.eval(bindings).asString());
+
+if (s == null) {
+  // True nulls do not match anything. Note: this branch only executes 
in SQL-compatible null handling mode.
+  return ExprEval.of(false, ExprType.LONG);
+} else {
+  final Matcher matcher = 
pattern.matcher(NullHandling.nullToEmptyIfNeeded(s));

Review comment:
   Good catch, I removed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428831347



##
File path: 
processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.expression;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+
+import javax.annotation.Nonnull;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro
+{
+  private static final String FN_NAME = "regexp_like";
+
+  @Override
+  public String name()
+  {
+return FN_NAME;
+  }
+
+  @Override
+  public Expr apply(final List args)
+  {
+if (args.size() < 2 || args.size() > 3) {

Review comment:
   Good catch. I'll fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428827872



##
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##
@@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception
 );
   }
 
+  @Test
+  public void testRegexpExtractFilterViaNotNullCheck() throws Exception
+  {
+// Cannot vectorize due to extractionFn in dimension spec.
+cannotVectorize();
+
+testQuery(
+"SELECT COUNT(*)\n"
++ "FROM foo\n"
++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' 
|| dim1, '^Z2') IS NOT NULL",
+ImmutableList.of(
+Druids.newTimeseriesQueryBuilder()
+  .dataSource(CalciteTests.DATASOURCE1)
+  .intervals(querySegmentSpec(Filtration.eternity()))
+  .granularity(Granularities.ALL)
+  .virtualColumns(
+  expressionVirtualColumn("v0", 
"regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING)
+  )
+  .filters(
+  or(
+  not(selector("dim1", null, new 
RegexDimExtractionFn("^1", 0, true, null))),
+  not(selector("v0", null, null))
+  )
+  )
+  .aggregators(new CountAggregatorFactory("a0"))
+  .context(TIMESERIES_CONTEXT_DEFAULT)
+  .build()
+),
+ImmutableList.of(
+new Object[]{3L}
+)
+);
+  }
+
+  @Test
+  public void testRegexpLikeFilter() throws Exception
+  {
+// Cannot vectorize due to usage of regex filter.
+cannotVectorize();
+
+testQuery(
+"SELECT COUNT(*)\n"
++ "FROM foo\n"
++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')",

Review comment:
   > a multi-value column
   
   There aren't tests for multi-value columns for this specific expression, nor 
for other functions that aren't multi-value-aware. (There is a separate system 
that handles mapping over non-multi-value-aware functions.) I could see having 
a systematic way to test for this (a query generator, maybe) but I don't think 
adding one just for this function would make sense.
   
   > a numeric column
   
   There aren't tests for a numeric column. It should be a validation error but 
we don't currently have very comprehensive tests for the validator. I could add 
one in an ad-hoc way right now, and that would make sense. But I was actually 
hoping to add more systematic tests in a future patch, so could do it then too.
   
   > matching against null
   
   I added a test for matching against null to ExpressionsTest.
   
   > The docs say that "The pattern must match starting at the beginning of 
expr" but it looks like the regex pattern you are passing in is asking that it 
start at the beginning of the string via ^ in the pattern string. Can I use a $ 
in my regex to ask that it matches the end of the expr?
   
   Wow! After looking into your comment, I realized that I totally screwed up 
the docs here. I actually have a test in this patch for what happens when you 
skip the `^`, and it _does_ match substrings that are in the middle of the 
string, and the test expects that to happen, all in contradiction of what the 
docs say. I'll fix the docs. Thanks for calling this to my attention.
   
   And yes, you can use `$` to ask that it match the end. There are tests for 
this too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428782343



##
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java
##
@@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final 
SqlTypeFamily... operandTypes)
   return this;
 }
 
-public OperatorBuilder operandTypeInference(final SqlOperandTypeInference 
operandTypeInference)
+public OperatorBuilder requiredOperands(final int requiredOperands)
 {
-  this.operandTypeInference = operandTypeInference;
+  this.requiredOperands = requiredOperands;
   return this;
 }
 
-public OperatorBuilder requiredOperands(final int requiredOperands)
+public OperatorBuilder literalOperands(final int... literalOperands)

Review comment:
   By the way, I also renamed `returnType` to `returnTypeNonNull` to make 
it clearer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428782087



##
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java
##
@@ -430,36 +434,64 @@ public void inferOperandTypes(
 
   /**
* Operand type checker that is used in 'simple' situations: there are a 
particular number of operands, with
-   * particular types, some of which may be optional or nullable.
+   * particular types, some of which may be optional or nullable, and some of 
which may be required to be literals.
*/
   private static class DefaultOperandTypeChecker implements 
SqlOperandTypeChecker
   {
 private final List operandTypes;
 private final int requiredOperands;
 private final IntSet nullableOperands;
+private final IntSet literalOperands;
 
 DefaultOperandTypeChecker(
 final List operandTypes,
 final int requiredOperands,
-final IntSet nullableOperands
+final IntSet nullableOperands,
+@Nullable final int[] literalOperands
 )
 {
   Preconditions.checkArgument(requiredOperands <= operandTypes.size() && 
requiredOperands >= 0);
   this.operandTypes = Preconditions.checkNotNull(operandTypes, 
"operandTypes");
   this.requiredOperands = requiredOperands;
   this.nullableOperands = Preconditions.checkNotNull(nullableOperands, 
"nullableOperands");
+
+  if (literalOperands == null) {
+this.literalOperands = IntSets.EMPTY_SET;
+  } else {
+this.literalOperands = new IntArraySet();
+Arrays.stream(literalOperands).forEach(this.literalOperands::add);
+  }
 }
 
 @Override
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean 
throwOnFailure)
 {
-  if (operandTypes.size() != callBinding.getOperandCount()) {
-// Just like FamilyOperandTypeChecker: assume this is an inapplicable 
sub-rule of a composite rule; don't throw
-return false;
-  }
-
   for (int i = 0; i < callBinding.operands().size(); i++) {
 final SqlNode operand = callBinding.operands().get(i);
+
+if (literalOperands.contains(i)) {
+  // Verify that 'operand' is a literal.
+  if (!SqlUtil.isLiteral(operand)) {
+return throwOrReturn(
+throwOnFailure,
+callBinding,
+cb -> cb.getValidator()
+.newValidationError(
+operand,
+
Static.RESOURCE.argumentMustBeLiteral(callBinding.getOperator().getName())
+)
+);
+  }
+
+  if (!nullableOperands.contains(i) && SqlUtil.isNullLiteral(operand, 
true)) {

Review comment:
   Sure, I think this makes sense. I changed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428780412



##
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java
##
@@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final 
SqlTypeFamily... operandTypes)
   return this;
 }
 
-public OperatorBuilder operandTypeInference(final SqlOperandTypeInference 
operandTypeInference)
+public OperatorBuilder requiredOperands(final int requiredOperands)
 {
-  this.operandTypeInference = operandTypeInference;
+  this.requiredOperands = requiredOperands;
   return this;
 }
 
-public OperatorBuilder requiredOperands(final int requiredOperands)
+public OperatorBuilder literalOperands(final int... literalOperands)

Review comment:
   I added javadocs to all these methods.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428758597



##
File path: docs/querying/sql.md
##
@@ -322,17 +322,18 @@ String functions accept strings, and return a type 
appropriate to the function.
 |`LOWER(expr)`|Returns expr in all lowercase.|
 |`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the 
given radix, or 10 (decimal) if a radix is not provided.|
 |`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of needle 
within haystack, with indexes starting from 1. The search will begin at 
fromIndex, or 1 if fromIndex is not specified. If the needle is not found, 
returns 0.|
-|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression pattern and 
extract a capture group, or null if there is no match. If index is unspecified 
or zero, returns the substring that matched the pattern.|
+|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` 
to `expr` and extract a capture group, or `NULL` if there is no match. If index 
is unspecified or zero, returns the substring that matched the pattern. The 
pattern must match starting at the beginning of `expr`, but does not need to 
match the entire string. Note: when `druid.generic.useDefaultValueForNull = 
true`, it is not possible to differentiate an empty-string match from a 
non-match (both will return `NULL`).|
+|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular 
expression `pattern`. The pattern must match starting at the beginning of 
`expr`, but does not need to match the entire string. Especially useful in 
WHERE clauses.|

Review comment:
   Added both notes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.

2020-05-21 Thread GitBox


gianm commented on a change in pull request #9893:
URL: https://github.com/apache/druid/pull/9893#discussion_r428757383



##
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java
##
@@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final 
SqlTypeFamily... operandTypes)
   return this;
 }
 
-public OperatorBuilder operandTypeInference(final SqlOperandTypeInference 
operandTypeInference)

Review comment:
   Ah, yeah, I removed it because it wasn't being used; I suppose we could 
add it back if needed in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org