Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
......................................................................


Patch Set 11:

(15 comments)

Please also add exprs.test as discussed offline to cover cases in which pattern 
or replace are non-constant.

http://gerrit.cloudera.org:8080/#/c/5776/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS9, Line 290: 
nit: indent 4.


PS9, Line 291: t, buffer_space)
Please see comments from previous patch. Isn't this bytes_produced computed 
above ?


PS9, Line 305: _LE(ptr - resul
Same as bytes_remaining above ?


http://gerrit.cloudera.org:8080/#/c/5776/11/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS11, Line 227: replace = new ReplaceContext(pattern);
Sorry, I may have misunderstood your question about this. We need to call 
context->Allocate() here in order to track the memory consumption. The general 
guideline is to avoid untracked memory usage as much as possible.


PS11, Line 236: delete rptr;
Please use context->Free().


PS11, Line 274: (delta > 0 && delta < 128)
parenthesis seems unnecessary here.


PS11, Line 274: 128
Mind documenting how the number 128 is derived ? In other words, why not 256, 
512, or 64 ?


PS11, Line 282: (replace.len - pattern.len)
Just use 'delta' for clarity.


PS11, Line 288: haystack.len - pattern.len + replace.len
For clarity, can you please use haystack.len + delta ?


PS11, Line 322:        const int bytes_remaining = haystack.len - consumed;
This seems to overlap with the remaining_bytes in line 352. How about we hoist 
it out of the loop like the following ?

// number of bytes to match in the original string
int bytes_remaining = haystack.len - consumed;
while (bytes_remaining >= pattern.len) {
   ...
   ...

   consumed = match_pos;
   bytes_remaining = haystack.len - consumed;
   ....
   .....

   if (delta > 0) {
       ....
   }

}


PS11, Line 335: it's
nit: its


PS11, Line 337:           
static_assert(BitUtil::IsPowerOf2(StringVal::MAX_LENGTH),
              :               "buffer_space to not exceed MAX_LENGTH requires 
it to be a power of 2");
I am not sure I understand the purpose of this assert completely. Do we rely on 
it for line 331 above to be effective ?

Why don't we move line 331 to to the point after line 343 ? This seems to be 
easier to follow and I am not sure just assuming the resize to succeed is a 
good thing. We can fail to resize for other reasons (e.g. exceeding memory 
limit set on a query).


PS11, Line 342: const auto ofs = ptr - result.ptr;
Isn't this the same as bytes_produced above ?


PS11, Line 344: DCHECK_EQ(resized, true);
As mentioned above, this may not hold all the time as we can exceed memory 
limit even if buffer_space <= StringVal::MAX_LENGTH;


PS11, Line 352: const int remaining_bytes = haystack.len - consumed;
If you take the suggestion above to hoist bytes_remaining out of the while 
loop, you don't need this line.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zach Amsden <[email protected]>
Gerrit-HasComments: Yes

Reply via email to