Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 7:

(16 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@56
PS6, Line 56: so will the newly created SmallableString
> We could also mention that it will point to the same data.
Done


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@63
PS6, Line 63: g f
> Could change this to uint32_t.
I tried to change the interface, but then this CR would blow up even more, as I 
would have to modify the types at all the call-sites.

I think it makes more sense to only change the implementation in this patch. 
Then, if we want, we can change the interface and usages in a different CR/Jira.

We could only switch to uint32_t here in SmallableString, and add static_casts 
to StringValue's member functions, but I'm not sure if it worth the trouble. We 
might change both eventually in one shot.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@86
PS6, Line 86:
> Could change this to uint32_t.
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@113
PS6, Line 113:
> Could change this to uint32_t.
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@129
PS6, Line 129:
> Could change this to uint32_t.
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@130
PS6, Line 130:   /// Sets a new length for this object. To keep it safe, the 
length can only be
> It could be mentioned in the function comment that it can only be decreased
Done


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@147
PS6, Line 147:   SimpleString ToSimpleString() const {
> Could we use the Ptr() and Len() methods here? Or would it be less efficien
Since we are at the IsSmall() == true branch here, it's safe to use the small 
representation. Probably the compiler would be smart enough to optimize the 
code, but I want to make sure things are as fast as they can be.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@157
PS6, Line 157: }
> Could change this to uint32_t.
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@161
PS6, Line 161: l_r
> Could change this to uint32_t.
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc@785
PS6, Line 785: ptr_type
> We could use 'char*' instead of 'ptr_type' on L786.
CollectionValue's Ptr() returns uint8_t*.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc
File be/src/runtime/string-value.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@54
PS4, Line 54:   int i = len - 1;
> I opened https://issues.apache.org/jira/browse/IMPALA-12478 for it.
Thank you!


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc
File be/src/runtime/string-value.cc:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@50
PS6, Line 50: uint32_t
> Could use uint32_t, as on L38.
Done


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@80
PS6, Line 80: uint32_t
> Could use uint32_t, as on L38.
Done


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc@87
PS6, Line 87:   for (const SlotDescriptor* slot : desc.string_slots()) {
> Can't we use a range-based for-loop?
Done


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h@306
PS6, Line 306: ,
> Nit: it would be better if it was a comma, i.e. "'buffer', except ...".
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc@507
PS4, Line 507:   if (value->IsSmall()) {
> I think we should add a DCHECK for that.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 03 Oct 2023 16:41:30 +0000
Gerrit-HasComments: Yes

Reply via email to