Mihaly Szjatinya has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/21825 )
Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition ...................................................................... IMPALA-889: Add trim() function matching ANSI SQL definition As agreed in JIRA discussions, the current PR extends existing TRIM functionality with the support of SQL-standardized TRIM-FROM syntax: TRIM ( [[ LEADING | TRAILING | BOTH ] characters FROM ] expression ). Implemented based on the existing LTRIM / RTRIM / BTRIM family of functions prepared earlier in IMPALA-6059 and extended for UTF-8 in IMPALA-12718. Besides, partly based on abandoned PR https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM functionality from https://github.com/apache/impala/commit/543fa73f3a846 f0e4527514c993cb0985912b06c. Supported syntaxes: Syntax #1 TRIM(<where> FROM <string>); Syntax #2 TRIM(<charset> FROM <string>); Syntax #3 TRIM(<where> <charset> FROM <string>); "where": Case-insensitive trim direction. Valid options are "leading", "trailing", and "both". "leading" means trimming characters from the start; "trailing" means trimming characters from the end; "both" means trimming characters from both sides. For Syntax #2, since no "where" is specified, the option "both" is implied by default. "charset": Case-sensitive characters to be removed. This argument is regarded as a character set going to be removed. The occurrence order of each character doesn't matter and duplicated instances of the same character will be ignored. NULL argument implies " " (standard space) by default. Empty argument ("" or '') makes TRIM return the string untouched. For Syntax #1, since no "charset" is specified, it trims " " (standard space) by default. "string": Case-sensitive target string to trim. This argument can be NULL. Design Notes: 1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover all needed use-cases, no backend logic is required. This differs from similar EXTRACT-FROM. 2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper around FunctionCallExpr, which instantiates one of the regular LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the integrity of the "phantom" TRIM-FROM built-in function. 3. No keywords. Following EXTRACT-FROM, no "TRIM" keyword was added to the language. Although generally a keyword would allow easier and better parsing, on the negative side it restricts token's usage in general context. Arguably, this shouldn't be the case since built-in functions are specifically removed from the reserved words list. However, despite that logic in the code, having "TRIM" just as a keyword would still result in syntax error (e.g. in queries like: "create table t1(trim int)" ). Limitations: 1. Analysis vs Parsing. It seems a large part of the analysis in analysis/*.Expr files can be performed at the parsing stage. In the case of TRIM-FROM, this comes in part from the no-keyword design, as well as from the lack of a straightforward verification mechanism for complex cases. Although this analysis may be lacking. 2. Design inconsistency. If the no-BE solution is accepted as more preferable, effort should be made to see whether EXTRACT-FROM can be refactored accordingly. Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 --- M be/src/exprs/expr-test.cc M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test 7 files changed, 535 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21825/10 -- To view, visit http://gerrit.cloudera.org:8080/21825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Gerrit-Change-Number: 21825 Gerrit-PatchSet: 10 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]>
