Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() ......................................................................
Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/5776/2/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 210: if (pattern.len == 0) return str; > pattern can be null as well Yes, unclear what the proper behavior is. I elected to return the original in that case as well (null also has len == 0). PS2, Line 221: // First pass - find matches to compute output size > Thinking it a bit more, the answer could be it depends. In general, append( This depends on how expensive the allocation and potential over-allocation is for large strings. We could greedily grab a large enough buffer for mid-size strings (1Mb-ish) and probably want a different strategy with dynamic allocation for very large strings or worst case expansions. Line 225: if (match_pos_in_substring < 0) > single line Done PS2, Line 237: int rlen = num_matches * (replace.len - needle.len) + str.len; > Can this overflow with a malicious replacement string ? Yes, overflow is a real possibility. Underflow is not. PS2, Line 249: DCHECK(match_pos_in_substring >= 0); > DCHECK_GE() If we know the number of matches already, yes - i < num_matches. If we change this to a one-pass algorithm, the answer is more complicated. http://gerrit.cloudera.org:8080/#/c/5776/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 2955: ident_or_restricted:ident > Not sure if dotted_path is the right place to modify as it's used at other dotted_path is overloaded to be the primary term for function_name - not sure this was a perfect choice but it looks hard to change this decision now. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-HasComments: Yes
