Mihaly Szjatinya has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/21825 )
Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition ...................................................................... IMPALA-889: Add trim() function matching ANSI SQL definition As agreed in JIRA discussions, the current PR extends existing TRIM functionality with the support of SQL-standardized TRIM-FROM syntax: TRIM ( [[ LEADING | TRAILING | BOTH ] characters FROM ] expression ). Implemented based on the existing LTRIM / RTRIM / BTRIM family of functions prepared earlier in IMPALA-6059 and extended for UTF-8 in IMPALA-12718. Besides, partly based on abandoned PR https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM functionality from https://github.com/apache/impala/commit/543fa73f3a846 f0e4527514c993cb0985912b06c. Supported syntaxes: Syntax #1 TRIM(<where> FROM <string>); Syntax #2 TRIM(<charset> FROM <string>); Syntax #3 TRIM(<where> <charset> FROM <string>); "where": Case-insensitive trim direction. Valid options are "leading", "trailing", and "both". "leading" means trimming characters from the start; "trailing" means trimming characters from the end; "both" means trimming characters from both sides. For Syntax #2, since no "where" is specified, the option "both" is implied by default. "charset": Case-sensitive characters to be removed. This argument is regarded as a character set going to be removed. The occurrence order of each character doesn't matter and duplicated instances of the same character will be ignored. NULL argument implies " " (standard space) by default. Empty argument ("" or '') makes TRIM return the string untouched. For Syntax #1, since no "charset" is specified, it trims " " (standard space) by default. "string": Case-sensitive target string to trim. This argument can be NULL. Design Notes: 1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover all needed use-cases, no backend logic is required. This differs from similar EXTRACT-FROM. 2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper around FunctionCallExpr, which instantiates one of the regular LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the integrity of the "phantom" TRIM-FROM built-in function. 3. No keywords. Following EXTRACT-FROM, no "TRIM" keyword was added to the language. Although generally a keyword would allow easier and better parsing, on the negative side it restricts token's usage in general context. Arguably, this shouldn't be the case since built-in functions are specifically removed from the reserved words list. However, despite that logic in the code, having "TRIM" just as a keyword still results in syntax error (e.g. in queries like "create table t1(trim int)" ). Limitations: 1. Analysis vs Parsing. It seems a large part of the analysis in analysis/*.Expr files can be performed at the parsing stage. In the case of TRIM-FROM, this comes in part from the no-keyword design, as well as from the lack of a straightforward verification mechanism for complex cases. Although this analysis may be lacking. 2. Escaping "where". Since "leading", "trailing", and "both" are inherited from earlier Impala versions as reserved words, these have to be escaped in order to be used as identifiers. 3. Corner cases. In the current version, the following syntax results in an analysis error: select id, name, trim(name from "string") from table; These work fine though: select id, name, trim(table.name from "string") from table; select id, name, trim(`leading` name from "string") from table; select id, name, trim('charset' from name) from table; Imho, it would make sense to get some general feedback on the design before fixing this. Similarly, the behavior around NULL-values and empty arguments needs to be thought through. 4. Design inconsistency. If the no-BE solution is accepted as more preferable, effort should be made to see whether EXTRACT-FROM can be refactored accordingly. Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 --- M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test 4 files changed, 359 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21825/5 -- To view, visit http://gerrit.cloudera.org:8080/21825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Gerrit-Change-Number: 21825 Gerrit-PatchSet: 5 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]>
