Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 4:

(9 comments)

great work!

I couldn't process the whole patch yet

http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@86
PS4, Line 86: *
My biggest concerns are not about testing small strings, but about testing the 
old solution - in many cases we use relatively short strings in tests and will 
not employ the on-heap version. This is not a problem now as all things are 
tested with the old solution, but new features may get untested with big 
strings and fail/crash if they meet one in real life.

for example functonal.alltypes's has string of size of 1 and 8. I am also 
worried about complex types - they also seem to be tested mainly with short 
strings.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@73
PS4, Line 73:   bool IsSmall() const { return string_impl_.IsSmall(); }
            :
            :   bool Smallify() { return string_impl_.Smallify(); }
Add comments?


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h
File be/src/runtime/string-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@117
PS4, Line 117: Substring
did you check the call sites? I am concerned about using a temporary 
StringValue for this - in that case with small strings the pointers may point 
to an object that no longer exists. This was not a problem before as the actual 
data was on heap, the value itself didn't matter

It seems safer to me apply the small string optimization inside the function - 
as the result can't be longer, if this is a short string, the result should be 
also a short string


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@128
PS4, Line 128: inline StringValue StringValue::Trim() const {
same as line 117


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@66
PS4, Line 66:
Please add comment about why we can stop here.

A more detailed comment could be added to tuple.h, and relevant code could 
refer to that.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@91
PS4, Line 91: return
An idea for this optimization:
The planner could use the avg size stat when ordering the columns in the 
planner. Moving the large ones first would mean that the backand would rarely 
miss out on smallifying what it can.

AFAIK computeMemLayout() only sorts columns based on size of the slot itself 
and doesn't care about the string itself


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@107
PS4, Line 107: SmallifyStrings
I am a bit concerned about changing the source in a copy. Would it be hard to 
move smallifying inside DeepCopyVarlenData?


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc@246
PS4, Line 246:   LOG(WARNING) << "Start CodegenLexicalCompare";
looks temporary logging


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc@350
PS4, Line 350:     LOG(WARNING) << "Finished CodegenLexicalCompare";
looks temporary logging



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Fri, 22 Sep 2023 13:23:04 +0000
Gerrit-HasComments: Yes

Reply via email to