Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 )
Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49 PS2, Line 49: 4.91 4.91 0.128X 0.128X 0.128X : from_unixtime(0,'yyyy-MM-dd HH:mm:ss') Wonder why this became slow during the last patch. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47 PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX I am not sure what ISO means here exactly, but in the other constants it leads to a "T" between the date and time part, while in your new format it is just related to the subseconds part. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81 PS1, Line 81: Is this different than DEFAULT_DATE_TIME_CTX[0]? http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86 PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX Is this different than DEFAULT_DATE_TIME_CTX[9]? http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341 PS2, Line 341: tmp_buff I think that we can easily avoid using the tmp_buff here. As num_val is a non-negative int32, we could simply use an on-stack char array of size 10, and copy from there. There is also an alternative to snprintf that we use in int->string casts, FastInt32ToBufferLeft() from gutil: https://github.com/apache/impala/blob/master/be/src/gutil/strings/numbers.h#L171 We use it here: https://github.com/apache/impala/blob/master/be/src/exprs/cast-functions-ir.cc#L156 We could use FastInt32ToBufferLeft() to print to an array on stack, and then calculate the number of '0's to prepend from the result, and then copy it to dst. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344 PS2, Line 344: for (int i = written_length; i < tok.len; i++) { : *(dst + pos) = '0'; : pos++; : } If I understand this correctly, then we are writing to the end of the buffer here, while we used to write to the beginning of the tmp_str. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length exceeded!"; I am a bit concerned that at the time we hit this DCHECK, we have already written outside the buffer in the AppendToBuffer functions. It would be safer and probably not too slow to pass max_length as an argument to AppendToBuffer(s) and skip the copy there if it has overflown. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 +0000 Gerrit-HasComments: Yes