Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20496 )
Change subject: IMPALA-12373: Small String Optimization for StringValue ...................................................................... Patch Set 15: (11 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/parquet/parquet-common.h File be/src/exec/parquet/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/parquet/parquet-common.h@573 PS14, Line 573: v->Assign(reinterpret_cast<char*>(const_cast<uint8_t*>(buffer)) + sizeof(int32_t), > Can you add a comment about not smallifying the string? I added a comment to SmallableString::Assign(). I don't think it's good practice to add such comments at the callsites, because if you change the implementation (e.g. we start Smallifying by default), these comments will probably remain unchanged. http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/text-converter.inline.h File be/src/exec/text-converter.inline.h: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/text-converter.inline.h@99 PS14, Line 99: l > Wouldn't it be clearer to use SimpleString instead of StringValue from the Using SimpleString does make the code simpler. But I don't think it's good idea to add comments about implementation details of StringValue at the callsites. http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/buffered-tuple-stream.cc@948 PS14, Line 948: ComputeRowSize > It could be mentioned in the function comments that this also smallifies st Done http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/smallable-string.h File be/src/runtime/smallable-string.h: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/smallable-string.h@44 PS14, Line 44: SMALL_LIMIT > future hack idea: the layout seems to be able to keep even bigger strings i Could you please create a Jira ticket for this? http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@607 PS14, Line 607: if (string_val->IsSmall()) continue; : string_values->push_back(string_val) > Do we actually need to collect these strings? The function could be renamed Good idea, thanks. http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@654 PS14, Line 654: Ubsan::MemCpy(dest, string_val->Ptr( > see comment at line 607, small strings could be ignore completely. Done http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@686 PS14, Line 686: DCHECK_LE(dest - page_start, sorter_->page_len_); > see comment at line 607 Done http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h@67 PS14, Line 67: void Assign(const StringValue& other) { string_impl_.Assign(other.string_impl_); } : : void Assign(char* ptr, int len) { > optional: overloads with const std::string& and StringVal could be added as I had a version with std::string, but I removed that at some point because it was very error-prone. We don't copy the string contents so StringValue is only valid as long as the std::string object is alive. And it was too easy to make mistakes like having dangling StringValues. With pointers and lens the mechanics are somewhat more obvious. http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h@78 PS14, Line 78: /// 'false' otherwise. In the latter case the object remains unmodified. > Can you also mention the case when it is already smallified? I'm afraid that comments about the usages of this class can easily become obsolete and will make it harder to modify the behavior. http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/tuple.cc@91 PS14, Line 91: // StringValues are only smallif > Can you add comments about the reason for return instead of continue? Added a comment. http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/util/in-list-filter.cc File be/src/util/in-list-filter.cc: http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/util/in-list-filter.cc@141 PS14, Line 141: if (s.IsSmall()) { : values_.insert(s) > This seems inconsistent with newly_inserted_values_.total_len, as StringSet Good catch. Updated StringSetWithTotalLen::Insert. -- To view, visit http://gerrit.cloudera.org:8080/20496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3 Gerrit-Change-Number: 20496 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 16 Nov 2023 19:53:42 +0000 Gerrit-HasComments: Yes
