Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20496 )
Change subject: IMPALA-12373: Small String Optimization for StringValue ...................................................................... Patch Set 4: (9 comments) great work! I couldn't process the whole patch yet http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@86 PS4, Line 86: * My biggest concerns are not about testing small strings, but about testing the old solution - in many cases we use relatively short strings in tests and will not employ the on-heap version. This is not a problem now as all things are tested with the old solution, but new features may get untested with big strings and fail/crash if they meet one in real life. for example functonal.alltypes's has string of size of 1 and 8. I am also worried about complex types - they also seem to be tested mainly with short strings. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@73 PS4, Line 73: bool IsSmall() const { return string_impl_.IsSmall(); } : : bool Smallify() { return string_impl_.Smallify(); } Add comments? http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h File be/src/runtime/string-value.inline.h: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@117 PS4, Line 117: Substring did you check the call sites? I am concerned about using a temporary StringValue for this - in that case with small strings the pointers may point to an object that no longer exists. This was not a problem before as the actual data was on heap, the value itself didn't matter It seems safer to me apply the small string optimization inside the function - as the result can't be longer, if this is a short string, the result should be also a short string http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@128 PS4, Line 128: inline StringValue StringValue::Trim() const { same as line 117 http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@66 PS4, Line 66: Please add comment about why we can stop here. A more detailed comment could be added to tuple.h, and relevant code could refer to that. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@91 PS4, Line 91: return An idea for this optimization: The planner could use the avg size stat when ordering the columns in the planner. Moving the large ones first would mean that the backand would rarely miss out on smallifying what it can. AFAIK computeMemLayout() only sorts columns based on size of the slot itself and doesn't care about the string itself http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@107 PS4, Line 107: SmallifyStrings I am a bit concerned about changing the source in a copy. Would it be hard to move smallifying inside DeepCopyVarlenData? http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc@246 PS4, Line 246: LOG(WARNING) << "Start CodegenLexicalCompare"; looks temporary logging http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc@350 PS4, Line 350: LOG(WARNING) << "Finished CodegenLexicalCompare"; looks temporary logging -- To view, visit http://gerrit.cloudera.org:8080/20496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3 Gerrit-Change-Number: 20496 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Fri, 22 Sep 2023 13:23:04 +0000 Gerrit-HasComments: Yes
