Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20496 )
Change subject: IMPALA-12373: Small String Optimization for StringValue ...................................................................... Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@46 PS4, Line 46: * lower memory and CPU cache consumption I'm not really clear where this is coming from. StringValue appears to essentially be a string_view wrapping an existing C-string. Ok, I see some of the usage is StringValue's that reference data owned by one source, and we need to make a copy of that string in a new context. For Small strings we can avoid that malloc and copy because the StringValue itself contains the data. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h File be/src/runtime/smallable-string.h: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@77 PS4, Line 77: rep.long_rep.len =len; nit: add space after = http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@83 PS4, Line 83: char last_char = reinterpret_cast<const char*>(this)[sizeof(*this) - 1]; This seems to rely on the string size being limited to 2^31 bytes; we should call that out. We should also add asserts whenever we assign to len that the smallify bit isn't set. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@100 PS4, Line 100: rep.long_rep.ptr = s; When would we ever hit this else case? Doesn't line 89 guard against it? http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@132 PS4, Line 132: void SetLen(int len) { The semantics of this method don't seem very clear. It looks like you can arbitrarily resize a non-small representation (but without changing the memory ptr points to), but you can only shrink a small string. When would we ever want to let this increase the size? -- To view, visit http://gerrit.cloudera.org:8080/20496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3 Gerrit-Change-Number: 20496 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Thu, 21 Sep 2023 18:54:20 +0000 Gerrit-HasComments: Yes
