Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20926
to look at the new patch set (#10).
Change subject: IMPALA-12718: Provides UTF-8 support for the trim functions
......................................................................
IMPALA-12718: Provides UTF-8 support for the trim functions
Currently, the trim function (including BTRIM, LTRIM, RTRIM) cannot
correctly handle strings containing multi-byte UTF-8 characters.
Multi-byte UTF-8 characters are interpreted as multiple single-byte
characters, leading to unexpected results.
This patch provides UTF-8 support for the trim functions, enabling these
functions to correctly handle multi-byte UTF-8 characters (when set
utf8_mode=true). It also introduces a set of trim functions with the
'utf8_' prefix, offering the same capability even when utf8_mode is not
enabled.
Testing:
- Added new BE test case in ExprTest#Utf8Test
- Added new E2E test case in TestUtf8StringFunctions
Change-Id: I5cfaffd71009f16eae75910af835bd2a34410856
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/util/bit-util.h
M be/src/util/string-util.cc
M common/function-registry/impala_functions.py
M
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
7 files changed, 286 insertions(+), 39 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/20926/10
--
To view, visit http://gerrit.cloudera.org:8080/20926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5cfaffd71009f16eae75910af835bd2a34410856
Gerrit-Change-Number: 20926
Gerrit-PatchSet: 10
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>