Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() ......................................................................
Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5776/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1954: TestStringValue("replace('abc', 'abcd', 'a')", "abc"); > suggest these additional tests: It may help to have non-constant input arguments. For instance: select replace(string_col, '0', '1') from functional.alltypestiny. And similar inputs for the pattern and the replacement string. http://gerrit.cloudera.org:8080/#/c/5776/2/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS2, Line 221: // First pass - find matches to compute output size > A high level question: for very long string, is a two pass algorithm better Thinking it a bit more, the answer could be it depends. In general, append() may incur extra copying when expanding the buffer (usually double every time). This may be an issue if the replacement string has the same length or longer than the pattern string. On the other hand, the one pass approach may be faster if the result string is a lot shorter than the original string (e.g. the replacement string is an empty string '') as it avoids the cache footprint of going over a potentially long haystack string again. May be it helps to have a microbenchmark. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-HasComments: Yes
