Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20496 )
Change subject: IMPALA-12373: Small String Optimization for StringValue ...................................................................... Patch Set 7: (16 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h File be/src/runtime/smallable-string.h: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@56 PS6, Line 56: so will the newly created SmallableString > We could also mention that it will point to the same data. Done http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@63 PS6, Line 63: g f > Could change this to uint32_t. I tried to change the interface, but then this CR would blow up even more, as I would have to modify the types at all the call-sites. I think it makes more sense to only change the implementation in this patch. Then, if we want, we can change the interface and usages in a different CR/Jira. We could only switch to uint32_t here in SmallableString, and add static_casts to StringValue's member functions, but I'm not sure if it worth the trouble. We might change both eventually in one shot. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@86 PS6, Line 86: > Could change this to uint32_t. Same as above. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@113 PS6, Line 113: > Could change this to uint32_t. Same as above. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@129 PS6, Line 129: > Could change this to uint32_t. Same as above. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@130 PS6, Line 130: /// Sets a new length for this object. To keep it safe, the length can only be > It could be mentioned in the function comment that it can only be decreased Done http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@147 PS6, Line 147: SimpleString ToSimpleString() const { > Could we use the Ptr() and Len() methods here? Or would it be less efficien Since we are at the IsSmall() == true branch here, it's safe to use the small representation. Probably the compiler would be smart enough to optimize the code, but I want to make sure things are as fast as they can be. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@157 PS6, Line 157: } > Could change this to uint32_t. Same as above. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@161 PS6, Line 161: l_r > Could change this to uint32_t. Same as above. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc@785 PS6, Line 785: ptr_type > We could use 'char*' instead of 'ptr_type' on L786. CollectionValue's Ptr() returns uint8_t*. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc File be/src/runtime/string-value.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@54 PS4, Line 54: int i = len - 1; > I opened https://issues.apache.org/jira/browse/IMPALA-12478 for it. Thank you! http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc File be/src/runtime/string-value.cc: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@50 PS6, Line 50: uint32_t > Could use uint32_t, as on L38. Done http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@80 PS6, Line 80: uint32_t > Could use uint32_t, as on L38. Done http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc@87 PS6, Line 87: for (const SlotDescriptor* slot : desc.string_slots()) { > Can't we use a range-based for-loop? Done http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h@306 PS6, Line 306: , > Nit: it would be better if it was a comma, i.e. "'buffer', except ...". Done http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc@507 PS4, Line 507: if (value->IsSmall()) { > I think we should add a DCHECK for that. Done -- To view, visit http://gerrit.cloudera.org:8080/20496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3 Gerrit-Change-Number: 20496 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 03 Oct 2023 16:41:30 +0000 Gerrit-HasComments: Yes
