Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5776/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1954:   TestStringValue("replace('abc', 'abcd', 'a')", "abc");
> suggest these additional tests:
It may help to have non-constant input arguments. For instance: select 
replace(string_col, '0', '1') from functional.alltypestiny. And similar inputs 
for the pattern and the replacement string.


http://gerrit.cloudera.org:8080/#/c/5776/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS2, Line 221: // First pass - find matches to compute output size
> A high level question: for very long string, is a two pass algorithm better
Thinking it a bit more, the answer could be it depends. In general, append() 
may incur extra copying when expanding the buffer (usually double every time). 
This may be an issue if the replacement string has the same length or longer 
than the pattern string. On the other hand, the one pass approach may be faster 
if the result string is a lot shorter than the original string (e.g. the 
replacement string is an empty string '') as it avoids the cache footprint of 
going over a potentially long haystack string again.

May be it helps to have a microbenchmark.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-HasComments: Yes

Reply via email to