Mihaly Szjatinya has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21825
Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition ...................................................................... IMPALA-889: Add trim() function matching ANSI SQL definition As agreed in JIRA discussions, the current PR extends existing TRIM functionality with the support of SQL-standardized TRIM-FROM syntax: TRIM ( [[ LEADING | TRAILING | BOTH ] characters FROM ] expression ). Implemented based on the existing LTRIM / RTRIM / BTRIM family of functions prepared earlier in IMPALA-6059 and extended for UTF-8 in MPALA-12718. Besides, partly based on abandoned PR https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM functionality from https://gerrit.sjc.cloudera.com/c/Impala/+/4579. Supported syntaxes: Syntax #1 TRIM(where FROM string); Syntax #2 TRIM(charset FROM string); Syntax #3 TRIM(where charset FROM string); "where": Case-insensitive trim direction. Valid options are "leading", "trailing", and "both". "leading" means trimming characters from the start; "trailing" means trimming characters from the end; "both" means trimming characters from both sides. For Syntax #2, since no "where" is specified, the option "both" is implied by default. "charset": Case-sensitive characters to be removed. This argument is regarded as a character set going to be removed. The occurrence order of each character doesn't matter and duplicated instances of the same character will be ignored. NULL argument implies " " (standard space) by default. Empty argument ("" or '') makes TRIM return the string untouched. For Syntax #1, since no "charset" is specified, it trims " " (standard space) by default. "string": Case-sensitive target string to trim. This argument can be NULL. Design Notes: 1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover all needed use-cases, no backend logic is required. This differs from similar EXTRACT-FROM. 2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper around FunctionCallExpr, which instantiates one of the regular LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the integrity of the "phantom" TRIM-FROM built-in function. 3. No keywords. Following EXTRACT-FROM, no "TRIM" keyword was added to the language. Alghough generally keyword would allow easier and better parsing, on the negativa side it restricts token's usage in general context. Arguably, this shouldn't be the case since built-in functions are specifically removed from the reserved keywords list, but currently using the TRIM keyword in remote contexts results in syntax errors. Limitations: 1. Analysis vs Parsing. It seems a large part of the analysis in analysis/*.Expr files can be performed at the parsing stage. In the case of TRIM-FROM, this comes in part from the no-keyword design, as well as from the lack of a straightforward verification mechanism for complex cases. Although this analysis may be lacking. 2. Escaping "where". Since "leading", "trailing", and "both" are inherited from earlier Impala versions as reserved words, these have to be escaped in order to be used as identifiers. 3. Corner cases. In the current version, the following syntax results in an analysis error: select id, name, trim(name from "string") from table; These work fine though: select id, name, trim(table.name from "string") from table; select id, name, trim(`leading` name from "string") from table; select id, name, trim('charset' from name) from table; Imho, it would make sense to get some general feedback on the design before fixing this. Similarly, the behavior around NULL-values and empty arguments needs to be thought through. 4. Design inconsistency. If the no-BE solution is accepted as more preferable, effort should be made to see whether EXTRACT-FROM can be refactored accordingly. Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 --- M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test 4 files changed, 359 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21825/2 -- To view, visit http://gerrit.cloudera.org:8080/21825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Gerrit-Change-Number: 21825 Gerrit-PatchSet: 2 Gerrit-Owner: Mihaly Szjatinya <[email protected]>
