[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch adds method TimestampValue::ToStringVal and reimplements TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then is passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. We ran a set of expression benchmarks in a machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern. Following are the detailed results before and after the patch. Before the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 36.7 37 37.3 1X 1X 1X cast(now() as string) 2.31 2.31 2.330.0628X 0.0623X0.0626X cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X 0.472X 0.471X from_unixtime(0,'-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.171X 0.17X 0.171X from_unixtime(0,'-MM-dd') 11.8 11.8 12 0.32X 0.32X 0.322X from_unixtime(0) 2.36 2.4 2.40.0644X 0.0648X0.0644X After the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 37.7 38.1 38.4 1X 1X 1X cast(now() as string) 29.9 30.1 30.2 0.794X 0.79X 0.787X cast(now() as string format 'Y .S') 61.1 61.3 61.6 1.62X 1.61X 1.61X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X 0.887X 0.892X from_unixtime(0,'-MM-dd') 50.5 50.6 50.9 1.34X 1.33X 1.33X from_unixtime(0) 34 34.2 34.5 0.902X 0.896X 0.898X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Reviewed-on: http://gerrit.cloudera.org:8080/17980 Reviewed-by: Impala Public Jenkins Tested-by: Csaba Ringhofer Reviewed-by: Csaba Ringhofer --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/util/min-max-filter.cc 20 files changed, 261 insertions(+), 159 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved Csaba Ringhofer: Looks good to me, approved;
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 10: Code-Review+2 Submitting as the GVO failed twice due to flaky tests. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 16:13:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 16:13:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has removed a vote on this change. Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has removed a vote on this change. Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Removed -Verified by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7610/ -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 13:40:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 07:19:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7610/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 07:19:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: > Patch Set 9: > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7607/ This time, we were hit by different flaky test at test_spilling.py. The failed test was introduced by IMPALA-9725. The test require for the hash join to spill its partitions, but it does not happen in https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4903 Following recent hash table memory improvement (IMPALA-7635), we might need to lower the buffer_pool_limit for that test so that it spill consistently. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 03:59:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7607/ -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Nov 2021 02:50:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7607/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 08 Nov 2021 20:35:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: GVO over patch set 9 hit flaky test previously reported at IMPALA-10886. That flaky test is not related with this patch. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 08 Nov 2021 18:13:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7601/ -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 08 Nov 2021 16:58:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 08 Nov 2021 10:43:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7601/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 08 Nov 2021 10:43:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 08 Nov 2021 10:43:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9725/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 05 Nov 2021 01:29:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 8: The GVO failed because I forgot handling for ROUND_YEAR and ISO8601_WEEK_NUMBERING_YEAR. They should be handled similarly as YEAR token. Patch set 8 fix that. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 05 Nov 2021 01:08:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#8). Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch adds method TimestampValue::ToStringVal and reimplements TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then is passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. We ran a set of expression benchmarks in a machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern. Following are the detailed results before and after the patch. Before the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 36.7 37 37.3 1X 1X 1X cast(now() as string) 2.31 2.31 2.330.0628X 0.0623X0.0626X cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X 0.472X 0.471X from_unixtime(0,'-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.171X 0.17X 0.171X from_unixtime(0,'-MM-dd') 11.8 11.8 12 0.32X 0.32X 0.322X from_unixtime(0) 2.36 2.4 2.40.0644X 0.0648X0.0644X After the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 37.7 38.1 38.4 1X 1X 1X cast(now() as string) 29.9 30.1 30.2 0.794X 0.79X 0.787X cast(now() as string format 'Y .S') 61.1 61.3 61.6 1.62X 1.61X 1.61X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X 0.887X 0.892X from_unixtime(0,'-MM-dd') 50.5 50.6 50.9 1.34X 1.33X 1.33X from_unixtime(0) 34 34.2 34.5 0.902X 0.896X 0.898X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/util/min-max-filter.cc 20 files changed, 261 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/8 -- To view, visit http://gerrit.cloudera.org:8080/17980 To
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7595/ -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 04 Nov 2021 23:21:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 04 Nov 2021 17:08:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7595/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 04 Nov 2021 17:08:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 02 Nov 2021 23:12:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9708/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 02 Nov 2021 17:28:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@85 PS3, Line 85: dst.resize(written); > nit: please add { } to the if Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213 PS3, Line 213: void TimestampValue::ToString(string& dst) const { > I disagree with dropping this function - I think that in non perf critical Right, it looks better to keep the TimestampValue::ToString() variant. Putting it back and revert changes due to previous function signature change. http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h@223 PS5, Line 223: if (UNLIKELY(written < 0)) { > nit: usually we add { + } if we break the line at ifs. Done -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 02 Nov 2021 17:11:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#6). Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch adds method TimestampValue::ToStringVal and reimplements TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then is passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. We ran a set of expression benchmarks in a machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern. Following are the detailed results before and after the patch. Before the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 36.7 37 37.3 1X 1X 1X cast(now() as string) 2.31 2.31 2.330.0628X 0.0623X0.0626X cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X 0.472X 0.471X from_unixtime(0,'-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.171X 0.17X 0.171X from_unixtime(0,'-MM-dd') 11.8 11.8 12 0.32X 0.32X 0.322X from_unixtime(0) 2.36 2.4 2.40.0644X 0.0648X0.0644X After the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 37.7 38.1 38.4 1X 1X 1X cast(now() as string) 29.9 30.1 30.2 0.794X 0.79X 0.787X cast(now() as string format 'Y .S') 61.1 61.3 61.6 1.62X 1.61X 1.61X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X 0.887X 0.892X from_unixtime(0,'-MM-dd') 50.5 50.6 50.9 1.34X 1.33X 1.33X from_unixtime(0) 34 34.2 34.5 0.902X 0.896X 0.898X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/util/min-max-filter.cc 20 files changed, 260 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/6 -- To view, visit http://gerrit.cloudera.org:8080/17980 To
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 5: Code-Review+1 Looks great! -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 02 Nov 2021 13:44:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 5: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@85 PS3, Line 85: dst.resize(written); nit: please add { } to the if http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213 PS3, Line 213: } > Unfortunately, there are couple call sites to this function. Especially the I disagree with dropping this function - I think that in non perf critical code ToString() is simpler to use, and removing it would just lead to create new convenience functions like you did in the tests. So I think it would be the best to have both string TimestampValue::ToString() and void TimestampValue::ToString(string&) http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h@223 PS5, Line 223: if (UNLIKELY(written < 0)) nit: usually we add { + } if we break the line at ifs. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 02 Nov 2021 12:21:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@17 PS3, Line 17: adds method > nit reimplements Done http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@22 PS3, Line 22: format. The chosen DateTimeFormatContext then is passed to > nit is passed Done http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@32 PS3, Line 32: > nit. duplicated in before/after. Probably should be mentioned the para at l Done http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@33 PS3, Line 33: > nit. this column can be removed? Done http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@38 PS3, Line 38: 2.31 > nit. not aligned with the rest of the values in this column. Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h File be/src/runtime/datetime-iso-sql-format-tokenizer.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h@111 PS3, Line 111: Iterates throug > nit. fmt_out_max_len_? Removed. This is now written directly to DateTimeFormatContext.fmt_out_len. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc@401 PS3, Line 401: } : } : return nullptr; : } : > inline? Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@79 PS3, Line 79: /// max_length -- the maximum length of characters that 'dst' can hold. Only used for : /// assertion in debug build. > I need to update this comment in next patch set, since we're enforcing the Done http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@305 PS2, Line 305: CATOR: { > optional: Done. Benchmarked it with expression "cast(now() as string format 'Y .S')". Compared patch set 3 vs 4, the (10%ile, 50%ile, 90%ile) increased from (19.9, 20.1, 20.3) to (61.1, 61.3, 61.6). 3X increase. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK(!d.is_special()); > After changing AppendToBuffer() now we can't this dcheck, even if we want t Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@83 PS3, Line 83: st.clear(); > UNLIKELY? Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213 PS3, Line 213: } > Can we update any remaining callers to the new variant and eliminate this f Unfortunately, there are couple call sites to this function. Especially the output stream operator of TimestampValue. In patch set 4, I change the signature, asking the caller to supply a string output argument. Add a comment as well in the header file warning caller to reuse the output string. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222 PS3, Line 222: > So the space is bounded by the row batch size * max_length? Approx how much Yes, I think batch size * max_length is the approximation. There is also 8 bytes alignment and power2 round up thing in mem-pool code I have not fully understand yet. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@224 PS3, Line 224: int64_t t_in_nano_sec = t.total_nanoseconds(); > Should use C++ cast Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@225 PS3, Line 225: > unlikely? Done http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@230 PS3, Line 230: int64_t days = total_in_nano_sec / NANOS_PER_DAY; : int64_t nano_secs_remaining = total_in_nano_sec % NANOS_PER_DAY; : return TimestampValue(date_ + boost::gregorian::date_duration(days), : boost::posix_time::time_duration(0, 0, 0, nano_secs_remaining)); : > nit This method probably can be inlined. Done. Moved to timestap-value.inline.h as well. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType:
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9702/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 01 Nov 2021 17:14:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222 PS3, Line 222: StringVal result(ctx, max_length); > The allocation comes from exps_results_pool_. So the space is bounded by the row batch size * max_length? Approx how much?? -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 01 Nov 2021 17:07:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#5). Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch adds method TimestampValue::ToStringVal and reimplements TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then is passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. We ran a set of expression benchmarks in a machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern. Following are the detailed results before and after the patch. Before the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 36.7 37 37.3 1X 1X 1X cast(now() as string) 2.31 2.31 2.330.0628X 0.0623X0.0626X cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X 0.472X 0.471X from_unixtime(0,'-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.171X 0.17X 0.171X from_unixtime(0,'-MM-dd') 11.8 11.8 12 0.32X 0.32X 0.322X from_unixtime(0) 2.36 2.4 2.40.0644X 0.0648X0.0644X After the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 37.7 38.1 38.4 1X 1X 1X cast(now() as string) 29.9 30.1 30.2 0.794X 0.79X 0.787X cast(now() as string format 'Y .S') 61.1 61.3 61.6 1.62X 1.61X 1.61X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X 0.887X 0.892X from_unixtime(0,'-MM-dd') 50.5 50.6 50.9 1.34X 1.33X 1.33X from_unixtime(0) 34 34.2 34.5 0.902X 0.896X 0.898X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exec/kudu-util-ir.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/service/client-request-state.cc M be/src/util/min-max-filter.cc 22 files changed, 316 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#4). Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch reimplement TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. We ran a set of expression benchmarks in a machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern. Following are the detailed results before and after the patch. Before the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 36.7 37 37.3 1X 1X 1X cast(now() as string) 2.31 2.31 2.330.0628X 0.0623X0.0626X cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X 0.472X 0.471X from_unixtime(0,'-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.171X 0.17X 0.171X from_unixtime(0,'-MM-dd') 11.8 11.8 12 0.32X 0.32X 0.322X from_unixtime(0) 2.36 2.4 2.40.0644X 0.0648X0.0644X After the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --- literal 37.7 38.1 38.4 1X 1X 1X cast(now() as string) 29.9 30.1 30.2 0.794X 0.79X 0.787X cast(now() as string format 'Y .S') 61.1 61.3 61.6 1.62X 1.61X 1.61X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X 0.887X 0.892X from_unixtime(0,'-MM-dd') 50.5 50.6 50.9 1.34X 1.33X 1.33X from_unixtime(0) 34 34.2 34.5 0.902X 0.896X 0.898X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exec/kudu-util-ir.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/service/client-request-state.cc M be/src/util/min-max-filter.cc 22 files changed, 316 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/4 -- To view, visit
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (1 comment) Thanks for the quick answer of my questions. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@86 PS3, Line 86: dst.resize(written) > It is possible that we allocate larger than we actually needs. Done -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 29 Oct 2021 20:01:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (4 comments) Thank you everyone for the feedbacks. Will address them in the next patch set. I'd like to answer some of the questions first. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@128 PS3, Line 128: int max_length > I wonder if we need this argument, since the source is 'str' which has a le max_length here is precautionary hint passed from TimestampParser::Format. When writing, we should not hit case where pos becomes > max_length. If that does happens, DCHECK error should be raised at TimestampParser::Format. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@271 PS2, Line 271: > optional: we could optimize the case when day/month/year are all needed by This is interesting. We should look at it in the next JIRA. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@86 PS3, Line 86: dst.resize(written) > I wonder why resize once dst is populated. It is possible that we allocate larger than we actually needs. For example, "cast(now() as string format 'HH24:mm:ss')" will allocate 10 bytes at first. But the output result in only 8 bytes, like "17:10:57". So the second resize here is done to update the dst.length(). http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222 PS3, Line 222: StringVal result(ctx, max_length); > When is this memory released? Seems this will still allocate via ctx for ev The allocation comes from exps_results_pool_. In between RowBatch processing, exec nodes will call QueryMaintenance() method in some frequencies to return all allocated memory so far back to MemPool though expr_results_pool_->Clear(). https://github.com/apache/impala/blob/b42c649/be/src/exec/exec-node.cc#L483 This, however, does not free the memory. MemPool will retain the memory chunks so that it can be reused in the next RowBatch processing. After the exec node completes, it will call Close() and free the memory chunks back by calling expr_results_pool_->FreeAll(). https://github.com/apache/impala/blob/b42c649/be/src/exec/exec-node.cc#L320 Contrast to Clear(), FreeAll() does free the memory. IMPALA-5844 explain more about this detail. There is also IMPALA-2399 that discuss about plan to remove QueryMaintenance(). -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 29 Oct 2021 19:32:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213 PS3, Line 213: string TimestampValue::ToString() const { Can we update any remaining callers to the new variant and eliminate this function? http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222 PS3, Line 222: StringVal result(ctx, max_length); When is this memory released? Seems this will still allocate via ctx for every value and not release it? http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@224 PS3, Line 224: TimestampParser::Format(dt_ctx, date_, time_, max_length, (char*)result.ptr); Should use C++ cast -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 29 Oct 2021 17:08:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (12 comments) Just have some very minor comments. Looks great to me and the improvement is impressive! http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@17 PS3, Line 17: reimplement nit reimplements http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@22 PS3, Line 22: passed to TimestampParser::Format along with date_ and time_ to be nit is passed http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@32 PS3, Line 32: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz nit. duplicated in before/after. Probably should be mentioned the para at line 26. http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@33 PS3, Line 33: iters/ms nit. this column can be removed? http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@38 PS3, Line 38: 5.94 nit. not aligned with the rest of the values in this column. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h File be/src/runtime/datetime-iso-sql-format-tokenizer.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h@111 PS3, Line 111: fmt_out_max_len nit. fmt_out_max_len_? http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc@401 PS3, Line 401: const DateTimeFormatContext* SimpleDateFormatTokenizer::GetDefaultTimestampFormatContext( : const time_duration& time) { : return time.fractional_seconds() > 0 ? _DATE_TIME_CTX[9] : : _SHORT_DATE_TIME_CTX; : } inline? http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@128 PS3, Line 128: int max_length I wonder if we need this argument, since the source is 'str' which has a length attribute. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@83 PS3, Line 83: (written < 0) UNLIKELY? http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@86 PS3, Line 86: dst.resize(written) I wonder why resize once dst is populated. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@225 PS3, Line 225: (written < 0) unlikely? http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@230 PS3, Line 230: StringVal TimestampValue::ToStringVal(FunctionContext* ctx) const { : const DateTimeFormatContext* dt_ctx = : SimpleDateFormatTokenizer::GetDefaultTimestampFormatContext(time_); : return ToStringVal(ctx, *dt_ctx); : } nit This method probably can be inlined. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 29 Oct 2021 17:00:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: Code-Review+1 (3 comments) lgtm I have some additional ideas on how to optimize common cases of timestamp formatting, but this patch is ok without them, the comments are more like a brain dump. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@271 PS2, Line 271: optional: we could optimize the case when day/month/year are all needed by using boost::gregorian's year_month_day() Here is an example: https://www.boost.org/doc/libs/1_61_0/doc/html/date_time/examples.html#date_time.examples.dates_as_strings My guess is that it is not too common to use only a subset of year/month/date, so we could probably always compute year_month_day if we need any date component. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@305 PS2, Line 305: CATOR: { optional: this could be replaced with a division with a precomputed int in DateTimeFormatContext - the current logic should make Format() much slower if we use low fractional precision, e.g. millisecs At least a benchmark could be added to see how bad it is. A similar thing could be done for AdjustYearToLength() to avoid calling std::pow per row. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK(!d.is_special()); > Done. After changing AppendToBuffer() now we can't this dcheck, even if we want to write too many characters. My idea is to copy only a limited amount, but increase the pos with the original length to hit the dcheck is something goes wrong. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 29 Oct 2021 07:11:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9691/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 19:46:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49 PS2, Line 49: 3.6 33.6 33.8 0.888X 0.884X 0.88X : from_unixtime(0,'-MM-dd') > Some unoptimal change in patch set 2. Might be the reinterpret_cast thing f Patch set 3 shows even faster result after we use FastInt32ToBufferLeft and local char array. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47 PS1, Line 47: > Right, this shouldn't be named ISO at all. I think I will change it to DEFA This is removed in patch set 3. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81 PS1, Line 81: , > The difference is in the cast direction. All previous tokenizer have PARSE I get what you're saying now. The CastDirection does not differentiate the resulting DateTimeFormatContext. So we can actually reuse the existing DateTimeFormatContext, even if they're tokenized with PARSE direction. In this case, this should be the same as DEFAULT_SHORT_DATE_TIME_CTX. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86 PS1, Line 86: case 5: // hh:mm > Difference is in the cast direction as well. This one has FORMAT direction. This is removed in patch set 3. We reuse DEFAULT_DATE_TIME_CTX[9] instead. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@79 PS3, Line 79: /// max_length -- the maximum length of characters that 'dst' can hold. Only used for : /// assertion in debug build. I need to update this comment in next patch set, since we're enforcing the max_length in patch set 3. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341 PS2, Line 341: > Good point. Having on-stack char array should also drop the necessity of tm Done http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK(!d.is_special()); > Will do. Done. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc@208 PS1, Line 208: } else { > This still copy on return in patch set 2, but only for usages outside of ex It looks like returning string by value is actually OK, since std::string has move constructor. https://www.delftstack.com/howto/cpp/return-string-from-function-cpp/ -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 19:35:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#3). Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch reimplement TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern, as shown by the following benchmark (modified from expr-benchmark), before and after the patch. Before the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) - literal 37.4 38.6 39.1 1X 1X 1X cast(now() as string)2.2 2.28 2.31 0.0589X0.0591X0.0591X from_unixtime(0,'-MM-dd HH:mm:ss') 5.94 6.18 6.23 0.159X 0.16X 0.159X from_unixtime(0,'-MM-dd') 11.5 11.8 11.9 0.308X 0.305X 0.304X from_unixtime(0) 2.26 2.35 2.39 0.0606X 0.061X0.0612X After the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) - literal 37.9 38 38.4 1X 1X 1X cast(now() as string) 29.3 29.3 29.4 0.773X 0.77X 0.767X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.6 33.8 0.888X 0.884X 0.88X from_unixtime(0,'-MM-dd') 49.9 49.9 50.3 1.32X 1.31X 1.31X from_unixtime(0) 33.1 33.2 33.5 0.875X 0.872X 0.873X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 13 files changed, 209 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public