Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@46
PS4, Line 46:  * lower memory and CPU cache consumption
I'm not really clear where this is coming from. StringValue appears to 
essentially be a string_view wrapping an existing C-string.

Ok, I see some of the usage is StringValue's that reference data owned by one 
source, and we need to make a copy of that string in a new context. For Small 
strings we can avoid that malloc and copy because the StringValue itself 
contains the data.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@77
PS4, Line 77:     rep.long_rep.len =len;
nit: add space after =


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@83
PS4, Line 83:     char last_char = reinterpret_cast<const 
char*>(this)[sizeof(*this) - 1];
This seems to rely on the string size being limited to 2^31 bytes; we should 
call that out.

We should also add asserts whenever we assign to len that the smallify bit 
isn't set.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@100
PS4, Line 100:       rep.long_rep.ptr = s;
When would we ever hit this else case? Doesn't line 89 guard against it?


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@132
PS4, Line 132:   void SetLen(int len) {
The semantics of this method don't seem very clear.

It looks like you can arbitrarily resize a non-small representation (but 
without changing the memory ptr points to), but you can only shrink a small 
string. When would we ever want to let this increase the size?



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Thu, 21 Sep 2023 18:54:20 +0000
Gerrit-HasComments: Yes

Reply via email to