Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )
Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. ...................................................................... Patch Set 6: (4 comments) Changed implementation to use templates. Also ran perf measurements to check for regressions. Please see patch set #7 http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475 PS4, Line 475: tionContext* context, > Sure, we can keep passing StringVal(" ") for the no-arg case. It's not used There are test cases that pass NULL for chars_to_trim, so that's taken care. http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@451 PS6, Line 451: direction == LEADING || direction == BOTH > To avoid a logic or, we can consider doing: Changed this to a templatized implementation to avoid branching overhead. http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@464 PS6, Line 464: //return DoTrimStringImpl(str, unique_chars, direction); > delete Done http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@466 PS6, Line 466: : StringVal StringFunctions::Trim(FunctionContext* context, const StringVal& str) { : return DoTrimString(context, str, StringVal(" "), BOTH); : } : : StringVal StringFunctions::Ltrim(FunctionContext* context, const StringVal& str) { : return DoTrimString(context, str, StringVal(" "), LEADING); : } : : StringVal StringFunctions::Rtrim(FunctionContext* context, const StringVal& str) { : return DoTrimString(context, str, StringVal(" "), TRAILING); : } > Can you please do a quick benchmark to make sure we didn't regress the perf Have measured the performance of the new code and old trim/ltrim/rtrim on a tpch_parquet table that has 1.5 billion rows. The test query basically scans all the string columns of the table (8 of them) and calls trim/ltrim/rtrim on each element, where each operation is something like select count(trim(l_shipinstruct)), etc. The idea is to check for any unacceptable perf regression due to how the code has been refactored. Time spent in these function calls aggregated by count()s are the same in the old and new code, within margin of error. -- To view, visit http://gerrit.cloudera.org:8080/8349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 Gerrit-Change-Number: 8349 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga <zo...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com> Gerrit-Comment-Date: Mon, 08 Jan 2018 22:06:12 +0000 Gerrit-HasComments: Yes