Mihaly Szjatinya has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/21825 )

Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition
......................................................................

IMPALA-889: Add trim() function matching ANSI SQL definition

As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM ( [[ LEADING | TRAILING | BOTH ] characters FROM ] expression ).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.

Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);

"where": Case-insensitive trim direction. Valid options are "leading",
  "trailing", and "both". "leading" means trimming characters from the
  start; "trailing" means trimming characters from the end; "both" means
  trimming characters from both sides. For Syntax #2, since no "where"
  is specified, the option "both" is implied by default.

"charset": Case-sensitive characters to be removed. This argument is
  regarded as a character set going to be removed. The occurrence order
  of each character doesn't matter and duplicated instances of the same
  character will be ignored. NULL argument implies " " (standard space)
  by default. Empty argument ("" or '') makes TRIM return the string
  untouched. For Syntax #1, since no "charset" is specified, it trims
  " " (standard space) by default.

"string": Case-sensitive target string to trim. This argument can be
  NULL.

UTF8_MODE sets it for TRIM-FROM, similarly as to existing TRIM()

UTF8_TRIM(FROM) can be used to force UTF8 mode regardless of the query
  option

Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.

2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper
around FunctionCallExpr, which instantiates one of the regular LTRIM /
RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the
integrity of the "phantom" TRIM-FROM built-in function.

3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was added to
the language. Although generally a keyword would allow easier and better
parsing, on the negative side it restricts token's usage in general
context. However, leading/trailing/both, being previously saved as
reserved words, are now added as keywords to make possible their usage
with no escaping.

Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
---
M be/src/exprs/expr-test.cc
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
7 files changed, 535 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21825/12
--
To view, visit http://gerrit.cloudera.org:8080/21825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Gerrit-Change-Number: 21825
Gerrit-PatchSet: 12
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>

Reply via email to