Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set 
of characters.
......................................................................


Patch Set 6:

(4 comments)

Changed implementation to use templates. Also ran perf measurements to check 
for regressions. Please see patch set #7

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475
PS4, Line 475: tionContext* context,
> Sure, we can keep passing StringVal(" ") for the no-arg case. It's not used
There are test cases that pass NULL for chars_to_trim, so that's taken care.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@451
PS6, Line 451: direction == LEADING || direction == BOTH
> To avoid a logic or, we can consider doing:
Changed this to a templatized implementation to avoid branching overhead.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@464
PS6, Line 464:   //return DoTrimStringImpl(str, unique_chars, direction);
> delete
Done


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@466
PS6, Line 466:
             : StringVal StringFunctions::Trim(FunctionContext* context, const 
StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), BOTH);
             : }
             :
             : StringVal StringFunctions::Ltrim(FunctionContext* context, const 
StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), LEADING);
             : }
             :
             : StringVal StringFunctions::Rtrim(FunctionContext* context, const 
StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), TRAILING);
             : }
> Can you please do a quick benchmark to make sure we didn't regress the perf
Have measured the performance of the new code and old trim/ltrim/rtrim on a 
tpch_parquet table that has 1.5 billion rows. The test query basically scans 
all the string columns of the table (8 of them) and calls trim/ltrim/rtrim on 
each element, where each operation is something like select 
count(trim(l_shipinstruct)), etc. The idea is to check for any unacceptable 
perf regression due to how the code has been refactored. Time spent in these 
function calls aggregated by count()s are the same in the old and new code, 
within margin of error.



--
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jan 2018 22:06:12 +0000
Gerrit-HasComments: Yes

Reply via email to