Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() ......................................................................
Patch Set 11: (15 comments) Please also add exprs.test as discussed offline to cover cases in which pattern or replace are non-constant. http://gerrit.cloudera.org:8080/#/c/5776/9/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS9, Line 290: nit: indent 4. PS9, Line 291: t, buffer_space) Please see comments from previous patch. Isn't this bytes_produced computed above ? PS9, Line 305: _LE(ptr - resul Same as bytes_remaining above ? http://gerrit.cloudera.org:8080/#/c/5776/11/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS11, Line 227: replace = new ReplaceContext(pattern); Sorry, I may have misunderstood your question about this. We need to call context->Allocate() here in order to track the memory consumption. The general guideline is to avoid untracked memory usage as much as possible. PS11, Line 236: delete rptr; Please use context->Free(). PS11, Line 274: (delta > 0 && delta < 128) parenthesis seems unnecessary here. PS11, Line 274: 128 Mind documenting how the number 128 is derived ? In other words, why not 256, 512, or 64 ? PS11, Line 282: (replace.len - pattern.len) Just use 'delta' for clarity. PS11, Line 288: haystack.len - pattern.len + replace.len For clarity, can you please use haystack.len + delta ? PS11, Line 322: const int bytes_remaining = haystack.len - consumed; This seems to overlap with the remaining_bytes in line 352. How about we hoist it out of the loop like the following ? // number of bytes to match in the original string int bytes_remaining = haystack.len - consumed; while (bytes_remaining >= pattern.len) { ... ... consumed = match_pos; bytes_remaining = haystack.len - consumed; .... ..... if (delta > 0) { .... } } PS11, Line 335: it's nit: its PS11, Line 337: static_assert(BitUtil::IsPowerOf2(StringVal::MAX_LENGTH), : "buffer_space to not exceed MAX_LENGTH requires it to be a power of 2"); I am not sure I understand the purpose of this assert completely. Do we rely on it for line 331 above to be effective ? Why don't we move line 331 to to the point after line 343 ? This seems to be easier to follow and I am not sure just assuming the resize to succeed is a good thing. We can fail to resize for other reasons (e.g. exceeding memory limit set on a query). PS11, Line 342: const auto ofs = ptr - result.ptr; Isn't this the same as bytes_produced above ? PS11, Line 344: DCHECK_EQ(resized, true); As mentioned above, this may not hold all the time as we can exceed memory limit even if buffer_space <= StringVal::MAX_LENGTH; PS11, Line 352: const int remaining_bytes = haystack.len - consumed; If you take the suggestion above to hoist bytes_remaining out of the while loop, you don't need this line. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-HasComments: Yes
