Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21825 )

Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition
......................................................................


Patch Set 3:

(34 comments)

Thanks for review. Fixed/answered all except a couple of niche topics for now:
1. string_expr
2. more tests
3. NULL cases
4. select trim(column1 from str) from t1

I'd appreciate opinions on the entire No-BE approach with phantom class, since 
BE solution like Extract is viable also (I even have it implemented already). 
As well as further in-depth on No-Keyword / analysis vs parsing.

http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@14
PS2, Line 14: IMPAL
> Nit: IMPALA-...
Done


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@16
PS2, Line 16: https://github.com/apache/impala/commit/543fa73
> Please do not refer to downstream resources.
ok, it's open for public though


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@16
PS2, Line 16: https://github.com/apache/impala/commit/543fa73
> This is an internal link, so it would be better to have a Github link to th
Done


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@19
PS2, Line 19: xes:
> Since both 'where' and 'string' are keywords, it would be better to put the
ok


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@51
PS2, Line 51: . Follow
> typo: Although
Done


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@51
PS2, Line 51: . Following EXTRACT-FROM,
> nit: Although generally, a keyword...
ok


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@50
PS2, Line 50:
            : 3. No keywor
> 'TRIM' is already part of the SQL standard. Using it as a keyword would mak
It was a reserved word by inheritance from earlier Impala. In current Impala 
all of the built-in functions are extracted from reserved words.
Adding KW_TRIM, which is needed for parsing, would mean introducing a new 
keyword, which generates syntax error (different from that for a reserved) if 
used as a column name, etc.


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@52
PS2, Line 52: g
> Nit: negative
Done


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@52
PS2, Line 52: though g
> typo: negative
Done


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@55
PS2, Line 55: are specifically removed from the reserved keywords list, but curr
> What do you mean exactly?
It invalidates statements like "create table t1(trim int)", which would be 
inconsistent with other built-in functions.


http://gerrit.cloudera.org:8080/#/c/21825/2//COMMIT_MSG@64
PS2, Line 64:
> Nit: line too long.
Done


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/cup/sql-parser.cup@3829
PS2, Line 3829:
> We could add an EE test for each of these cases as the "charset" parameter
Yes, that's a good idea. Will do in the next patch.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/cup/sql-parser.cup@3861
PS2, Line 3861:   | function_name:fn_name LPAREN ident_or_default:i KW_FROM 
expr:e RPAREN
> This syntax rule will also match date_part(year from now()) and create a Tr
This is how it was before with the extract being the only function to support 
FROM. Unnecessary wrapping stems mostly from not using keyword, which comes 
with it's own problems. See "Analysis vs Parsing" section for more on this.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/cup/sql-parser.cup@3873
PS2, Line 3873: compromising
> typo:compromising
Done


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/cup/sql-parser.cup@3873
PS2, Line 3873: compromising syntax like
> Could you elaborate on it, I don't completely understand the example. Maybe
To be honest I simply caught tests failing on this, and then  realized that the 
string_expr change introduced in https://gerrit.cloudera.org/#/c/4474 is 
exactly for this. Since this goes quite deep into the intricacies of parsing 
process, I took it as is from that review for now as a relatively minor 
technical detail. I see string_expr as a way to filter out (although not 
exhaustively) expr types which are not suitable for trim context on parsing 
stage, which is a good thing on it's own. The fact that it fixes some corner 
cases makes it only better. But as you've suggested in other comment, I think 
adding more tests for each type would be the way to go to cover all of the 
cases.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/cup/sql-parser.cup@3873
PS2, Line 3873: string_expr is needed here
> This is also true for the version above, isn't it? Then it would be better
I didn't catch similar problems with other syntaxes. It's more like: once 
string_expr needs to be introduced for this syntax anyway, why not use it for 
the others as well.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
File fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java:

http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@19
PS2, Line 19:
            : import com.google.common.collect.Lists;
            :
            : import org.apache.impala.catalog.BuiltinsDb;
> unused imports
Removed, thanks.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@35
PS2, Line 35:
> nit: the
Done


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@41
PS2, Line 41: h
> nit: a
Done


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@45
PS2, Line 45:
> This field shadows the field with the same name from the base class. We cou
Agree, not a good practice.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@45
PS2, Line 45:   // Trim option set.
> We can reuse the fnName_ from FunctionCallExpr
fnName_ here is "trim" or "utf8_trim", representing a phantom trim-from 
function. In FunctionCallExpr it resolves to a real ltrim/rtrim/btrim. Later 
each one goes through it's own analysis.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@48
PS2, Line 48:     TRAILING,
> could be srcExpr_
Indeed.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@51
PS2, Line 51:
> This could instead be an enum.
Agree, I tried to stay consistent with the twin ExtractFromExpr, but I think 
enum would do here.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@60
PS2, Line 60:
> Nit: should start with lower-case letter.
Fixed


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@69
PS2, Line 69:     this(fnName, where, null, srcexpr);
> Can't we raise an exception here?
Technically we can of course, but conceptually this would mean a ParseException 
which is not the best idea to throw from a regular analysis/*.java file. If we 
were to check this on parsing stage, Imho throwing here from constructor 
wouldn't be the best place for that. As a better option, this check could be 
done whether from within the .cup's {} clause (there is a "parseError" routine 
for that) or by adding "VerifyTrimOption" somewhere in .java files and calling 
it from .cup.

But checking "where" argument on parsing stage would assume doing the same for 
function name even prior to that, as well as reconsider the entire order of 
later analysis checks. And the same for ExtractFromExpr and potentially other 
places.

There is a certain order in analysis hierarchy of functions, according to which 
different aspects of a statement are being analyzed so that in case of multiple 
problems, error is generated about the most important one. Arbitrarily breaking 
this order is not trivial change and eventually leads to a global question on 
what is a parsing error and what is an analysis error, which I doubt we want to 
be raising in scope of this issue.

Please let me know if you have a better idea.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@98
PS2, Line 98:   }
> We should also override clone(), otherwise this copy constructor won't be u
@Override seems to result in error for constructors.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@111
PS2, Line 111: lysisEx
> It should be "trim()", shouldn't it?
Fixed, thanks.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@113
PS2, Line 113:
> This could be extracted into a variable.
Since fnName_ is now renamed, it's perhaps more clear to access via modified 
name, and not to introduce another one.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@125
PS2, Line 125:               .map(option -> "`" + option.name() + "`")
> This error should be caught during parsing, then TrimOption could be an enu
Answered in related threads.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@139
PS2, Line 139: null && srcExpr_.getType() != Type.STRING) {
             :       if (errMsg.length() > 0) errMsg.append(" ");
> This error message template could be extracted as a format string and reuse
Replied in the twin comment.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@143
PS2, Line 143:     }
> If the block body firs on one line, we omit the braces in Impala.
Agree, this could be perhaps specified in project's .clang-format


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java@144
PS2, Line 144: turn errMsg.toString();
             :   }
> This could be extracted into a function that could also be used on L139.
I would keep it as is just for two calls if you don't mind. Conceptually, each 
parameter may require it's own check, it's more of a coincidence that these two 
are so similar.


http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/21825/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@1927
PS2, Line 1927:     // Special cases for TRIM in function call
> You can also add parser tests for the syntax in 'ParserTest.java'.
Ack


http://gerrit.cloudera.org:8080/#/c/21825/2/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
File 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test:

http://gerrit.cloudera.org:8080/#/c/21825/2/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@398
PS2, Line 398: ---- QUERY
> We could have some EE tests with utf8_trim() too.
Will add.



--
To view, visit http://gerrit.cloudera.org:8080/21825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Gerrit-Change-Number: 21825
Gerrit-PatchSet: 3
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Comment-Date: Sat, 21 Sep 2024 12:29:26 +0000
Gerrit-HasComments: Yes

Reply via email to