Mihaly Szjatinya has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21825


Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition
......................................................................

IMPALA-889: Add trim() function matching ANSI SQL definition

As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM ( [[ LEADING | TRAILING | BOTH ] characters FROM ] expression ).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
MPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://gerrit.sjc.cloudera.com/c/Impala/+/4579.

Supported syntaxes:
Syntax #1 TRIM(where FROM string);
Syntax #2 TRIM(charset FROM string);
Syntax #3 TRIM(where charset FROM string);

"where": Case-insensitive trim direction. Valid options are "leading",
  "trailing", and "both". "leading" means trimming characters from the
  start; "trailing" means trimming characters from the end; "both" means
  trimming characters from both sides. For Syntax #2, since no "where"
  is specified, the option "both" is implied by default.

"charset": Case-sensitive characters to be removed. This argument is
  regarded as a character set going to be removed. The occurrence order
  of each character doesn't matter and duplicated instances of the same
  character will be ignored. NULL argument implies " " (standard space)
  by default. Empty argument ("" or '') makes TRIM return the string
  untouched. For Syntax #1, since no "charset" is specified, it trims
  " " (standard space) by default.

"string": Case-sensitive target string to trim. This argument can be
  NULL.

Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.

2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper
around FunctionCallExpr, which instantiates one of the regular LTRIM /
RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the
integrity of the "phantom" TRIM-FROM built-in function.

3. No keywords. Following EXTRACT-FROM, no "TRIM" keyword was added to
the language. Alghough generally keyword would allow easier and better
parsing, on the negativa side it restricts token's usage in general
context. Arguably, this shouldn't be the case since built-in functions
are specifically removed from the reserved keywords list, but currently
using the TRIM keyword in remote contexts results in syntax errors.

Limitations:
1. Analysis vs Parsing. It seems a large part of the analysis in
analysis/*.Expr files can be performed at the parsing stage. In the case
of TRIM-FROM, this comes in part from the no-keyword design, as well as
from the lack of a straightforward verification mechanism for complex
cases. Although this analysis may be lacking.

2. Escaping "where". Since "leading", "trailing", and "both" are inherited
from earlier Impala versions as reserved words, these have to be escaped
in order to be used as identifiers.

3. Corner cases. In the current version, the following syntax results in
an analysis error:
select id, name, trim(name from "string") from table;
These work fine though:
select id, name, trim(table.name from "string") from table;
select id, name, trim(`leading` name from "string") from table;
select id, name, trim('charset' from name) from table;
Imho, it would make sense to get some general feedback on the design
before fixing this. Similarly, the behavior around NULL-values and empty
arguments needs to be thought through.

4. Design inconsistency. If the no-BE solution is accepted as more
preferable, effort should be made to see whether EXTRACT-FROM can be
refactored accordingly.

Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
---
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
4 files changed, 359 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21825/2
--
To view, visit http://gerrit.cloudera.org:8080/21825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Gerrit-Change-Number: 21825
Gerrit-PatchSet: 2
Gerrit-Owner: Mihaly Szjatinya <[email protected]>

Reply via email to