Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 15:

(11 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/parquet/parquet-common.h@573
PS14, Line 573:   
v->Assign(reinterpret_cast<char*>(const_cast<uint8_t*>(buffer)) + 
sizeof(int32_t),
> Can you add a comment about not smallifying the string?
I added a comment to SmallableString::Assign(). I don't think it's good 
practice to add such comments at the callsites, because if you change the 
implementation (e.g. we start Smallifying by default), these comments will 
probably remain unchanged.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/text-converter.inline.h
File be/src/exec/text-converter.inline.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/text-converter.inline.h@99
PS14, Line 99: l
> Wouldn't it be clearer to use SimpleString instead of StringValue from the
Using SimpleString does make the code simpler.
But I don't think it's good idea to add comments about implementation details 
of StringValue at the callsites.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/buffered-tuple-stream.cc@948
PS14, Line 948: ComputeRowSize
> It could be mentioned in the function comments that this also smallifies st
Done


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/smallable-string.h@44
PS14, Line 44: SMALL_LIMIT
> future hack idea: the layout seems to be able to keep even bigger strings i
Could you please create a Jira ticket for this?


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@607
PS14, Line 607:       if (string_val->IsSmall()) continue;
              :       string_values->push_back(string_val)
> Do we actually need to collect these strings? The function could be renamed
Good idea, thanks.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@654
PS14, Line 654:     Ubsan::MemCpy(dest, string_val->Ptr(
> see comment at line 607, small strings could be ignore completely.
Done


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@686
PS14, Line 686:     DCHECK_LE(dest - page_start, sorter_->page_len_);
> see comment at line 607
Done


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h@67
PS14, Line 67:   void Assign(const StringValue& other) { 
string_impl_.Assign(other.string_impl_); }
             :
             :   void Assign(char* ptr, int len) {
> optional: overloads with const std::string& and StringVal could be added as
I had a version with std::string, but I removed that at some point because it 
was very error-prone. We don't copy the string contents so StringValue is only 
valid as long as the std::string object is alive. And it was too easy to make 
mistakes like having dangling StringValues.
With pointers and lens the mechanics are somewhat more obvious.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h@78
PS14, Line 78:   /// 'false' otherwise. In the latter case the object remains 
unmodified.
> Can you also mention the case when it is already smallified?
I'm afraid that comments about the usages of this class can easily become 
obsolete and will make it harder to modify the behavior.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/tuple.cc@91
PS14, Line 91:     // StringValues are only smallif
> Can you add comments about the reason for return instead of continue?
Added a comment.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/util/in-list-filter.cc
File be/src/util/in-list-filter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/util/in-list-filter.cc@141
PS14, Line 141:     if (s.IsSmall()) {
              :       values_.insert(s)
> This seems inconsistent with newly_inserted_values_.total_len, as StringSet
Good catch. Updated StringSetWithTotalLen::Insert.



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 16 Nov 2023 19:53:42 +0000
Gerrit-HasComments: Yes

Reply via email to