Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49:      4.91     4.91     0.128X     0.128X     0.128X
            : from_unixtime(0,'yyyy-MM-dd HH:mm:ss')
Wonder why this became slow during the last patch.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
I am not sure what ISO means here exactly, but in the other constants it leads 
to a "T" between the date and time part, while in your new format it is just 
related to the subseconds part.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81:
Is this different than DEFAULT_DATE_TIME_CTX[0]?


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
Is this different than DEFAULT_DATE_TIME_CTX[9]?


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341: tmp_buff
I think that we can easily avoid using the tmp_buff here.
As num_val is a non-negative int32, we could simply use an on-stack char array 
of size 10, and copy from there.

There is also an alternative to snprintf that we use in int->string casts, 
FastInt32ToBufferLeft() from gutil:
https://github.com/apache/impala/blob/master/be/src/gutil/strings/numbers.h#L171
We use it here: 
https://github.com/apache/impala/blob/master/be/src/exprs/cast-functions-ir.cc#L156

We could use FastInt32ToBufferLeft() to print to an array on stack, and then 
calculate the number of '0's to prepend from the result, and then copy it to 
dst.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344
PS2, Line 344:         for (int i = written_length; i < tok.len; i++) {
             :           *(dst + pos) = '0';
             :           pos++;
             :         }
If I understand this correctly, then we are writing to the end of the buffer 
here, while we used to write to the beginning of the tmp_str.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351:     DCHECK_LE(pos, max_length) << "Maximum buffer length 
exceeded!";
I am a bit concerned that at the time we hit this DCHECK, we have already 
written outside the buffer in the AppendToBuffer functions. It would be safer 
and probably not too slow to pass max_length as an argument to 
AppendToBuffer(s) and skip the copy there if it has overflown.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 +0000
Gerrit-HasComments: Yes

Reply via email to