[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r434745199 ## File path: processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.expression; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.Expr; +import org.apache.druid.math.expr.ExprEval; +import org.apache.druid.math.expr.ExprMacroTable; +import org.apache.druid.math.expr.ExprType; + +import javax.annotation.Nonnull; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro +{ + private static final String FN_NAME = "regexp_like"; + + @Override + public String name() + { +return FN_NAME; + } + + @Override + public Expr apply(final List args) + { +if (args.size() != 2) { + throw new IAE("Function[%s] must have 2 arguments", name()); +} + +final Expr arg = args.get(0); +final Expr patternExpr = args.get(1); + +if (!ExprUtils.isStringLiteral(patternExpr)) { + throw new IAE("Function[%s] pattern must be a string literal", name()); +} + +// Precompile the pattern. +final Pattern pattern = Pattern.compile( +StringUtils.nullToEmptyNonDruidDataString((String) patternExpr.getLiteralValue()) +); + +class RegexpLikeExpr extends ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr +{ + private RegexpLikeExpr(Expr arg) + { +super(FN_NAME, arg); + } + + @Nonnull + @Override + public ExprEval eval(final ObjectBinding bindings) + { +final String s = NullHandling.nullToEmptyIfNeeded(arg.eval(bindings).asString()); + +if (s == null) { + // True nulls do not match anything. Note: this branch only executes in SQL-compatible null handling mode. + return ExprEval.of(false, ExprType.LONG); +} else { + final Matcher matcher = pattern.matcher(NullHandling.nullToEmptyIfNeeded(s)); Review comment: Good catch, I removed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428831347 ## File path: processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.expression; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.Expr; +import org.apache.druid.math.expr.ExprEval; +import org.apache.druid.math.expr.ExprMacroTable; +import org.apache.druid.math.expr.ExprType; + +import javax.annotation.Nonnull; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro +{ + private static final String FN_NAME = "regexp_like"; + + @Override + public String name() + { +return FN_NAME; + } + + @Override + public Expr apply(final List args) + { +if (args.size() < 2 || args.size() > 3) { Review comment: Good catch. I'll fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428827872 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception ); } + @Test + public void testRegexpExtractFilterViaNotNullCheck() throws Exception + { +// Cannot vectorize due to extractionFn in dimension spec. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || dim1, '^Z2') IS NOT NULL", +ImmutableList.of( +Druids.newTimeseriesQueryBuilder() + .dataSource(CalciteTests.DATASOURCE1) + .intervals(querySegmentSpec(Filtration.eternity())) + .granularity(Granularities.ALL) + .virtualColumns( + expressionVirtualColumn("v0", "regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING) + ) + .filters( + or( + not(selector("dim1", null, new RegexDimExtractionFn("^1", 0, true, null))), + not(selector("v0", null, null)) + ) + ) + .aggregators(new CountAggregatorFactory("a0")) + .context(TIMESERIES_CONTEXT_DEFAULT) + .build() +), +ImmutableList.of( +new Object[]{3L} +) +); + } + + @Test + public void testRegexpLikeFilter() throws Exception + { +// Cannot vectorize due to usage of regex filter. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')", Review comment: > a multi-value column There aren't tests for multi-value columns for this specific expression, nor for other functions that aren't multi-value-aware. (There is a separate system that handles mapping over non-multi-value-aware functions.) I could see having a systematic way to test for this (a query generator, maybe) but I don't think adding one just for this function would make sense. > a numeric column There aren't tests for a numeric column. It should be a validation error but we don't currently have very comprehensive tests for the validator. I could add one in an ad-hoc way right now, and that would make sense. But I was actually hoping to add more systematic tests in a future patch, so could do it then too. > matching against null I added a test for matching against null to ExpressionsTest. > The docs say that "The pattern must match starting at the beginning of expr" but it looks like the regex pattern you are passing in is asking that it start at the beginning of the string via ^ in the pattern string. Can I use a $ in my regex to ask that it matches the end of the expr? Wow! After looking into your comment, I realized that I totally screwed up the docs here. I actually have a test in this patch for what happens when you skip the `^`, and it _does_ match substrings that are in the middle of the string, and the test expects that to happen, all in contradiction of what the docs say. I'll fix the docs. Thanks for calling this to my attention. And yes, you can use `$` to ask that it match the end. There are tests for this too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428782343 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final SqlTypeFamily... operandTypes) return this; } -public OperatorBuilder operandTypeInference(final SqlOperandTypeInference operandTypeInference) +public OperatorBuilder requiredOperands(final int requiredOperands) { - this.operandTypeInference = operandTypeInference; + this.requiredOperands = requiredOperands; return this; } -public OperatorBuilder requiredOperands(final int requiredOperands) +public OperatorBuilder literalOperands(final int... literalOperands) Review comment: By the way, I also renamed `returnType` to `returnTypeNonNull` to make it clearer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428782087 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -430,36 +434,64 @@ public void inferOperandTypes( /** * Operand type checker that is used in 'simple' situations: there are a particular number of operands, with - * particular types, some of which may be optional or nullable. + * particular types, some of which may be optional or nullable, and some of which may be required to be literals. */ private static class DefaultOperandTypeChecker implements SqlOperandTypeChecker { private final List operandTypes; private final int requiredOperands; private final IntSet nullableOperands; +private final IntSet literalOperands; DefaultOperandTypeChecker( final List operandTypes, final int requiredOperands, -final IntSet nullableOperands +final IntSet nullableOperands, +@Nullable final int[] literalOperands ) { Preconditions.checkArgument(requiredOperands <= operandTypes.size() && requiredOperands >= 0); this.operandTypes = Preconditions.checkNotNull(operandTypes, "operandTypes"); this.requiredOperands = requiredOperands; this.nullableOperands = Preconditions.checkNotNull(nullableOperands, "nullableOperands"); + + if (literalOperands == null) { +this.literalOperands = IntSets.EMPTY_SET; + } else { +this.literalOperands = new IntArraySet(); +Arrays.stream(literalOperands).forEach(this.literalOperands::add); + } } @Override public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { - if (operandTypes.size() != callBinding.getOperandCount()) { -// Just like FamilyOperandTypeChecker: assume this is an inapplicable sub-rule of a composite rule; don't throw -return false; - } - for (int i = 0; i < callBinding.operands().size(); i++) { final SqlNode operand = callBinding.operands().get(i); + +if (literalOperands.contains(i)) { + // Verify that 'operand' is a literal. + if (!SqlUtil.isLiteral(operand)) { +return throwOrReturn( +throwOnFailure, +callBinding, +cb -> cb.getValidator() +.newValidationError( +operand, + Static.RESOURCE.argumentMustBeLiteral(callBinding.getOperator().getName()) +) +); + } + + if (!nullableOperands.contains(i) && SqlUtil.isNullLiteral(operand, true)) { Review comment: Sure, I think this makes sense. I changed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428780412 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final SqlTypeFamily... operandTypes) return this; } -public OperatorBuilder operandTypeInference(final SqlOperandTypeInference operandTypeInference) +public OperatorBuilder requiredOperands(final int requiredOperands) { - this.operandTypeInference = operandTypeInference; + this.requiredOperands = requiredOperands; return this; } -public OperatorBuilder requiredOperands(final int requiredOperands) +public OperatorBuilder literalOperands(final int... literalOperands) Review comment: I added javadocs to all these methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428758597 ## File path: docs/querying/sql.md ## @@ -322,17 +322,18 @@ String functions accept strings, and return a type appropriate to the function. |`LOWER(expr)`|Returns expr in all lowercase.| |`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.| |`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of needle within haystack, with indexes starting from 1. The search will begin at fromIndex, or 1 if fromIndex is not specified. If the needle is not found, returns 0.| -|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression pattern and extract a capture group, or null if there is no match. If index is unspecified or zero, returns the substring that matched the pattern.| +|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` to `expr` and extract a capture group, or `NULL` if there is no match. If index is unspecified or zero, returns the substring that matched the pattern. The pattern must match starting at the beginning of `expr`, but does not need to match the entire string. Note: when `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both will return `NULL`).| +|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern must match starting at the beginning of `expr`, but does not need to match the entire string. Especially useful in WHERE clauses.| Review comment: Added both notes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428757383 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final SqlTypeFamily... operandTypes) return this; } -public OperatorBuilder operandTypeInference(final SqlOperandTypeInference operandTypeInference) Review comment: Ah, yeah, I removed it because it wasn't being used; I suppose we could add it back if needed in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org