[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch adds method TimestampValue::ToStringVal and reimplements
TimestampValue::ToString by supplying default DateTimeFormatContext if
no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the
default format if the time_ component does not have fractional seconds.
Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default
format. The chosen DateTimeFormatContext then is passed to
TimestampParser::Format along with date_ and time_ to be formatted into
the string representation. Int to string parsing method is replaced with
FastInt32ToBufferLeft in TimestampParser::Format.

We ran a set of expression benchmarks in a machine with Intel(R)
Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance
improvement for CAST timestamp to string and FROM_UNIXTIME without a
date-time pattern. Following are the detailed results before and after
the patch.

Before the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 36.7   37 37.3 1X   
  1X 1X
  cast(now() as string) 2.31 2.31 2.330.0628X   
 0.0623X0.0626X
cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X   
  0.472X 0.471X
 from_unixtime(0,'-MM-dd HH:mm:ss')  6.3  6.3 6.37 0.171X   
   0.17X 0.171X
  from_unixtime(0,'-MM-dd') 11.8 11.8   12  0.32X   
   0.32X 0.322X
   from_unixtime(0) 2.36  2.4  2.40.0644X   
 0.0648X0.0644X

After the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 37.7 38.1 38.4 1X   
  1X 1X
  cast(now() as string) 29.9 30.1 30.2 0.794X   
   0.79X 0.787X
cast(now() as string format 'Y .S') 61.1 61.3 61.6  1.62X   
   1.61X  1.61X
 from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X   
  0.887X 0.892X
  from_unixtime(0,'-MM-dd') 50.5 50.6 50.9  1.34X   
   1.33X  1.33X
   from_unixtime(0)   34 34.2 34.5 0.902X   
  0.896X 0.898X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Reviewed-on: http://gerrit.cloudera.org:8080/17980
Reviewed-by: Impala Public Jenkins 
Tested-by: Csaba Ringhofer 
Reviewed-by: Csaba Ringhofer 
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/util/min-max-filter.cc
20 files changed, 261 insertions(+), 159 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Csaba Ringhofer: Looks good to me, approved; 

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 10: Code-Review+2

Submitting as the GVO failed twice due to flaky tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 16:13:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 16:13:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has removed a vote on this change.

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has removed a vote on this change.

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Removed -Verified by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7610/


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 13:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 07:19:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7610/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 07:19:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9:

> Patch Set 9:
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7607/

This time, we were hit by different flaky test at test_spilling.py.
The failed test was introduced by IMPALA-9725. The test require for the hash 
join to spill its partitions, but it does not happen in 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4903

Following recent hash table memory improvement (IMPALA-7635), we might need to 
lower the buffer_pool_limit for that test so that it spill consistently.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 03:59:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7607/


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Nov 2021 02:50:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7607/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 08 Nov 2021 20:35:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9:

GVO over patch set 9 hit flaky test previously reported at IMPALA-10886.
That flaky test is not related with this patch.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 08 Nov 2021 18:13:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7601/


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 08 Nov 2021 16:58:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 08 Nov 2021 10:43:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7601/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 08 Nov 2021 10:43:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 08 Nov 2021 10:43:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9725/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 05 Nov 2021 01:29:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 8:

The GVO failed because I forgot handling for ROUND_YEAR and 
ISO8601_WEEK_NUMBERING_YEAR. They should be handled similarly as YEAR token.
Patch set 8 fix that.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 05 Nov 2021 01:08:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-04 Thread Riza Suminto (Code Review)
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#8).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch adds method TimestampValue::ToStringVal and reimplements
TimestampValue::ToString by supplying default DateTimeFormatContext if
no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the
default format if the time_ component does not have fractional seconds.
Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default
format. The chosen DateTimeFormatContext then is passed to
TimestampParser::Format along with date_ and time_ to be formatted into
the string representation. Int to string parsing method is replaced with
FastInt32ToBufferLeft in TimestampParser::Format.

We ran a set of expression benchmarks in a machine with Intel(R)
Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance
improvement for CAST timestamp to string and FROM_UNIXTIME without a
date-time pattern. Following are the detailed results before and after
the patch.

Before the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 36.7   37 37.3 1X   
  1X 1X
  cast(now() as string) 2.31 2.31 2.330.0628X   
 0.0623X0.0626X
cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X   
  0.472X 0.471X
 from_unixtime(0,'-MM-dd HH:mm:ss')  6.3  6.3 6.37 0.171X   
   0.17X 0.171X
  from_unixtime(0,'-MM-dd') 11.8 11.8   12  0.32X   
   0.32X 0.322X
   from_unixtime(0) 2.36  2.4  2.40.0644X   
 0.0648X0.0644X

After the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 37.7 38.1 38.4 1X   
  1X 1X
  cast(now() as string) 29.9 30.1 30.2 0.794X   
   0.79X 0.787X
cast(now() as string format 'Y .S') 61.1 61.3 61.6  1.62X   
   1.61X  1.61X
 from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X   
  0.887X 0.892X
  from_unixtime(0,'-MM-dd') 50.5 50.6 50.9  1.34X   
   1.33X  1.33X
   from_unixtime(0)   34 34.2 34.5 0.902X   
  0.896X 0.898X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/util/min-max-filter.cc
20 files changed, 261 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/8
--
To view, visit http://gerrit.cloudera.org:8080/17980
To 

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7595/


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 04 Nov 2021 23:21:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 04 Nov 2021 17:08:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7595/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 04 Nov 2021 17:08:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 02 Nov 2021 23:12:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9708/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 02 Nov 2021 17:28:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@85
PS3, Line 85: dst.resize(written);
> nit: please add { } to the if
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213
PS3, Line 213: void TimestampValue::ToString(string& dst) const {
> I disagree with dropping this function - I think that in non perf critical
Right, it looks better to keep the TimestampValue::ToString() variant.
Putting it back and revert changes due to previous function signature change.


http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h@223
PS5, Line 223:   if (UNLIKELY(written < 0)) {
> nit: usually we add { + } if we break the line at ifs.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 02 Nov 2021 17:11:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-02 Thread Riza Suminto (Code Review)
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#6).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch adds method TimestampValue::ToStringVal and reimplements
TimestampValue::ToString by supplying default DateTimeFormatContext if
no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the
default format if the time_ component does not have fractional seconds.
Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default
format. The chosen DateTimeFormatContext then is passed to
TimestampParser::Format along with date_ and time_ to be formatted into
the string representation. Int to string parsing method is replaced with
FastInt32ToBufferLeft in TimestampParser::Format.

We ran a set of expression benchmarks in a machine with Intel(R)
Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance
improvement for CAST timestamp to string and FROM_UNIXTIME without a
date-time pattern. Following are the detailed results before and after
the patch.

Before the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 36.7   37 37.3 1X   
  1X 1X
  cast(now() as string) 2.31 2.31 2.330.0628X   
 0.0623X0.0626X
cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X   
  0.472X 0.471X
 from_unixtime(0,'-MM-dd HH:mm:ss')  6.3  6.3 6.37 0.171X   
   0.17X 0.171X
  from_unixtime(0,'-MM-dd') 11.8 11.8   12  0.32X   
   0.32X 0.322X
   from_unixtime(0) 2.36  2.4  2.40.0644X   
 0.0648X0.0644X

After the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 37.7 38.1 38.4 1X   
  1X 1X
  cast(now() as string) 29.9 30.1 30.2 0.794X   
   0.79X 0.787X
cast(now() as string format 'Y .S') 61.1 61.3 61.6  1.62X   
   1.61X  1.61X
 from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X   
  0.887X 0.892X
  from_unixtime(0,'-MM-dd') 50.5 50.6 50.9  1.34X   
   1.33X  1.33X
   from_unixtime(0)   34 34.2 34.5 0.902X   
  0.896X 0.898X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/util/min-max-filter.cc
20 files changed, 260 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/6
--
To view, visit http://gerrit.cloudera.org:8080/17980
To 

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-02 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 5: Code-Review+1

Looks great!


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 02 Nov 2021 13:44:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 5: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@85
PS3, Line 85: dst.resize(written);
nit: please add { } to the if


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213
PS3, Line 213:   }
> Unfortunately, there are couple call sites to this function. Especially the
I disagree with dropping this function - I think that in non perf critical code 
ToString() is simpler to use, and removing it would just lead to create new 
convenience functions like you did in the tests. So I think it would be the 
best to have both string TimestampValue::ToString() and void 
TimestampValue::ToString(string&)


http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/17980/5/be/src/runtime/timestamp-value.inline.h@223
PS5, Line 223:   if (UNLIKELY(written < 0))
nit: usually we add { + } if we break the line at ifs.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 02 Nov 2021 12:21:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-01 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 5:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@17
PS3, Line 17: adds method
> nit reimplements
Done


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@22
PS3, Line 22: format. The chosen DateTimeFormatContext then is passed to
> nit is passed
Done


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@32
PS3, Line 32:
> nit. duplicated in before/after. Probably should be mentioned the para at l
Done


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@33
PS3, Line 33:
> nit. this column can be removed?
Done


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@38
PS3, Line 38:   2.31
> nit. not aligned with the rest of the values in this column.
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h@111
PS3, Line 111: Iterates throug
> nit. fmt_out_max_len_?
Removed.
This is now written directly to DateTimeFormatContext.fmt_out_len.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc@401
PS3, Line 401: }
 :   }
 :   return nullptr;
 : }
 :
> inline?
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@79
PS3, Line 79:   /// max_length -- the maximum length of characters that 'dst' 
can hold. Only used for
:   ///   assertion in debug build.
> I need to update this comment in next patch set, since we're enforcing the 
Done


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@305
PS2, Line 305: CATOR: {
> optional:
Done. Benchmarked it with expression "cast(now() as string format 'Y .S')".

Compared patch set 3 vs 4, the (10%ile, 50%ile, 90%ile) increased from (19.9, 
20.1, 20.3) to (61.1, 61.3, 61.6). 3X increase.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351:   DCHECK(!d.is_special());
> After changing AppendToBuffer() now we can't this dcheck, even if we want t
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@83
PS3, Line 83: st.clear();
> UNLIKELY?
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213
PS3, Line 213:   }
> Can we update any remaining callers to the new variant and eliminate this f
Unfortunately, there are couple call sites to this function. Especially the 
output stream operator of TimestampValue.

In patch set 4, I change the signature, asking the caller to supply a string 
output argument. Add a comment as well in the header file warning caller to 
reuse the output string.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222
PS3, Line 222:
> So the space is bounded by the row batch size * max_length? Approx how much
Yes, I think batch size * max_length is the approximation. There is also 8 
bytes alignment and power2 round up thing in mem-pool code I have not fully 
understand yet.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@224
PS3, Line 224:   int64_t t_in_nano_sec = t.total_nanoseconds();
> Should use C++ cast
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@225
PS3, Line 225:
> unlikely?
Done


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@230
PS3, Line 230: int64_t days = total_in_nano_sec / NANOS_PER_DAY;
 : int64_t nano_secs_remaining = total_in_nano_sec % 
NANOS_PER_DAY;
 : return TimestampValue(date_ + 
boost::gregorian::date_duration(days),
 : boost::posix_time::time_duration(0, 0, 0, 
nano_secs_remaining));
 :
> nit This method probably can be inlined.
Done. Moved to timestap-value.inline.h as well.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: 

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9702/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 01 Nov 2021 17:14:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-01 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222
PS3, Line 222:   StringVal result(ctx, max_length);
> The allocation comes from exps_results_pool_.
So the space is bounded by the row batch size * max_length? Approx how much??



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 01 Nov 2021 17:07:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-01 Thread Riza Suminto (Code Review)
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#5).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch adds method TimestampValue::ToStringVal and reimplements
TimestampValue::ToString by supplying default DateTimeFormatContext if
no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the
default format if the time_ component does not have fractional seconds.
Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default
format. The chosen DateTimeFormatContext then is passed to
TimestampParser::Format along with date_ and time_ to be formatted into
the string representation. Int to string parsing method is replaced with
FastInt32ToBufferLeft in TimestampParser::Format.

We ran a set of expression benchmarks in a machine with Intel(R)
Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance
improvement for CAST timestamp to string and FROM_UNIXTIME without a
date-time pattern. Following are the detailed results before and after
the patch.

Before the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 36.7   37 37.3 1X   
  1X 1X
  cast(now() as string) 2.31 2.31 2.330.0628X   
 0.0623X0.0626X
cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X   
  0.472X 0.471X
 from_unixtime(0,'-MM-dd HH:mm:ss')  6.3  6.3 6.37 0.171X   
   0.17X 0.171X
  from_unixtime(0,'-MM-dd') 11.8 11.8   12  0.32X   
   0.32X 0.322X
   from_unixtime(0) 2.36  2.4  2.40.0644X   
 0.0648X0.0644X

After the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 37.7 38.1 38.4 1X   
  1X 1X
  cast(now() as string) 29.9 30.1 30.2 0.794X   
   0.79X 0.787X
cast(now() as string format 'Y .S') 61.1 61.3 61.6  1.62X   
   1.61X  1.61X
 from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X   
  0.887X 0.892X
  from_unixtime(0,'-MM-dd') 50.5 50.6 50.9  1.34X   
   1.33X  1.33X
   from_unixtime(0)   34 34.2 34.5 0.902X   
  0.896X 0.898X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/kudu-util-ir.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/client-request-state.cc
M be/src/util/min-max-filter.cc
22 files changed, 316 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF 

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-11-01 Thread Riza Suminto (Code Review)
Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#4).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation. Int to string parsing method
is replaced with FastInt32ToBufferLeft in TimestampParser::Format.

We ran a set of expression benchmarks in a machine with Intel(R)
Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance
improvement for CAST timestamp to string and FROM_UNIXTIME without a
date-time pattern. Following are the detailed results before and after
the patch.

Before the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 36.7   37 37.3 1X   
  1X 1X
  cast(now() as string) 2.31 2.31 2.330.0628X   
 0.0623X0.0626X
cast(now() as string format 'Y .S') 16.9 17.5 17.5 0.459X   
  0.472X 0.471X
 from_unixtime(0,'-MM-dd HH:mm:ss')  6.3  6.3 6.37 0.171X   
   0.17X 0.171X
  from_unixtime(0,'-MM-dd') 11.8 11.8   12  0.32X   
   0.32X 0.322X
   from_unixtime(0) 2.36  2.4  2.40.0644X   
 0.0648X0.0644X

After the patch:
FromUnixCodegen:   Function   10%ile   50%ile   90%ile 10%ile   
  50%ile 90%ile
   (relative) 
(relative) (relative)
---
literal 37.7 38.1 38.4 1X   
  1X 1X
  cast(now() as string) 29.9 30.1 30.2 0.794X   
   0.79X 0.787X
cast(now() as string format 'Y .S') 61.1 61.3 61.6  1.62X   
   1.61X  1.61X
 from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X   
  0.887X 0.892X
  from_unixtime(0,'-MM-dd') 50.5 50.6 50.9  1.34X   
   1.33X  1.33X
   from_unixtime(0)   34 34.2 34.5 0.902X   
  0.896X 0.898X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/kudu-util-ir.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/client-request-state.cc
M be/src/util/min-max-filter.cc
22 files changed, 316 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/4
--
To view, visit 

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-29 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(1 comment)

Thanks for the quick answer of my questions.

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@86
PS3, Line 86: dst.resize(written)
> It is possible that we allocate larger than we actually needs.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 29 Oct 2021 20:01:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(4 comments)

Thank you everyone for the feedbacks. Will address them in the next patch set.

I'd like to answer some of the questions first.

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@128
PS3, Line 128: int max_length
> I wonder if we need this argument, since the source is 'str' which has a le
max_length here is precautionary hint passed from TimestampParser::Format.
When writing, we should not hit case where pos becomes > max_length. If that 
does happens, DCHECK error should be raised at TimestampParser::Format.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@271
PS2, Line 271:
> optional: we could optimize the case when day/month/year are all needed by
This is interesting. We should look at it in the next JIRA.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@86
PS3, Line 86: dst.resize(written)
> I wonder why resize once dst is populated.
It is possible that we allocate larger than we actually needs.
For example, "cast(now() as string format 'HH24:mm:ss')" will allocate 10 bytes 
at first. But the output result in only 8 bytes, like "17:10:57".
So the second resize here is done to update the dst.length().


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222
PS3, Line 222:   StringVal result(ctx, max_length);
> When is this memory released? Seems this will still allocate via ctx for ev
The allocation comes from exps_results_pool_.

In between RowBatch processing, exec nodes will call QueryMaintenance() method 
in some frequencies to return all allocated memory so far back to MemPool 
though expr_results_pool_->Clear().
https://github.com/apache/impala/blob/b42c649/be/src/exec/exec-node.cc#L483
This, however, does not free the memory. MemPool will retain the memory chunks 
so that it can be reused in the next RowBatch processing.

After the exec node completes, it will call Close() and free the memory chunks 
back by calling expr_results_pool_->FreeAll().
https://github.com/apache/impala/blob/b42c649/be/src/exec/exec-node.cc#L320
Contrast to Clear(), FreeAll() does free the memory.

IMPALA-5844 explain more about this detail. There is also IMPALA-2399 that 
discuss about plan to remove QueryMaintenance().



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 29 Oct 2021 19:32:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-29 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@213
PS3, Line 213: string TimestampValue::ToString() const {
Can we update any remaining callers to the new variant and eliminate this 
function?


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@222
PS3, Line 222:   StringVal result(ctx, max_length);
When is this memory released? Seems this will still allocate via ctx for every 
value and not release it?


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@224
PS3, Line 224:   TimestampParser::Format(dt_ctx, date_, time_, max_length, 
(char*)result.ptr);
Should use C++ cast



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 29 Oct 2021 17:08:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-29 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(12 comments)

Just have some very minor comments. Looks great to me and the improvement is 
impressive!

http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@17
PS3, Line 17: reimplement
nit reimplements


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@22
PS3, Line 22: passed to TimestampParser::Format along with date_ and time_ to be
nit is passed


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@32
PS3, Line 32: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
nit. duplicated in before/after. Probably should be mentioned the para at line 
26.


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@33
PS3, Line 33: iters/ms
nit. this column can be removed?


http://gerrit.cloudera.org:8080/#/c/17980/3//COMMIT_MSG@38
PS3, Line 38:   5.94
nit. not aligned with the rest of the values in this column.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-iso-sql-format-tokenizer.h@111
PS3, Line 111: fmt_out_max_len
nit. fmt_out_max_len_?


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/datetime-simple-date-format-parser.cc@401
PS3, Line 401: const DateTimeFormatContext* 
SimpleDateFormatTokenizer::GetDefaultTimestampFormatContext(
 : const time_duration& time) {
 :   return time.fractional_seconds() > 0 ? 
_DATE_TIME_CTX[9] :
 :  
_SHORT_DATE_TIME_CTX;
 : }
inline?


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@128
PS3, Line 128: int max_length
I wonder if we need this argument, since the source is 'str' which has a length 
attribute.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@83
PS3, Line 83: (written < 0)
UNLIKELY?


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@86
PS3, Line 86: dst.resize(written)
I wonder why resize once dst is populated.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@225
PS3, Line 225: (written < 0)
unlikely?


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-value.cc@230
PS3, Line 230: StringVal TimestampValue::ToStringVal(FunctionContext* ctx) 
const {
 :   const DateTimeFormatContext* dt_ctx =
 :   
SimpleDateFormatTokenizer::GetDefaultTimestampFormatContext(time_);
 :   return ToStringVal(ctx, *dt_ctx);
 : }
nit This method probably can be inlined.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 29 Oct 2021 17:00:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-29 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3: Code-Review+1

(3 comments)

lgtm
I have some additional ideas on how to optimize common cases of timestamp 
formatting, but this patch is ok without them, the comments are more like a 
brain dump.

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@271
PS2, Line 271:
optional: we could optimize the case when day/month/year are all needed by 
using boost::gregorian's year_month_day()

Here is an example:
https://www.boost.org/doc/libs/1_61_0/doc/html/date_time/examples.html#date_time.examples.dates_as_strings

My guess is that it is not too common to use only a subset of year/month/date, 
so we could probably always compute year_month_day if we need any date 
component.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@305
PS2, Line 305: CATOR: {
optional:
this could be replaced with a division with a precomputed int in 
DateTimeFormatContext - the current logic should make Format() much slower if 
we use low fractional precision, e.g. millisecs

At least a benchmark could be added to see how bad it is.

A similar thing could be done for AdjustYearToLength() to avoid calling 
std::pow per row.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351:   DCHECK(!d.is_special());
> Done.
After changing AppendToBuffer() now we can't this dcheck, even if we want to 
write too many characters. My idea is to copy only a limited amount, but 
increase the pos with the original length to hit the dcheck is something goes 
wrong.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 29 Oct 2021 07:11:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9691/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 19:46:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49: 3.6 33.6 33.8 0.888X 0.884X  0.88X
:   from_unixtime(0,'-MM-dd')
> Some unoptimal change in patch set 2. Might be the reinterpret_cast thing f
Patch set 3 shows even faster result after we use FastInt32ToBufferLeft and 
local char array.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47:
> Right, this shouldn't be named ISO at all. I think I will change it to DEFA
This is removed in patch set 3.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81: ,
> The difference is in the cast direction. All previous tokenizer have PARSE
I get what you're saying now. The CastDirection does not differentiate the 
resulting DateTimeFormatContext. So we can actually reuse the existing 
DateTimeFormatContext, even if they're tokenized with PARSE direction. In this 
case, this should be the same as DEFAULT_SHORT_DATE_TIME_CTX.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: case 5:   // hh:mm
> Difference is in the cast direction as well. This one has FORMAT direction.
This is removed in patch set 3. We reuse DEFAULT_DATE_TIME_CTX[9] instead.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@79
PS3, Line 79:   /// max_length -- the maximum length of characters that 'dst' 
can hold. Only used for
:   ///   assertion in debug build.
I need to update this comment in next patch set, since we're enforcing the 
max_length in patch set 3.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341:
> Good point. Having on-stack char array should also drop the necessity of tm
Done


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351:   DCHECK(!d.is_special());
> Will do.
Done.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc@208
PS1, Line 208:   } else {
> This still copy on return in patch set 2, but only for usages outside of ex
It looks like returning string by value is actually OK, since std::string has 
move constructor.
https://www.delftstack.com/howto/cpp/return-string-from-function-cpp/



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 19:35:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-28 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#3).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation. Int to string parsing method
is replaced with FastInt32ToBufferLeft in TimestampParser::Format.

This patch gives > 10X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.

Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.4 38.6 39.1
 1X 1X 1X
  cast(now() as string)2.2 2.28 2.31
0.0589X0.0591X0.0591X
from_unixtime(0,'-MM-dd HH:mm:ss')   5.94 6.18 6.23 
0.159X  0.16X 0.159X
  from_unixtime(0,'-MM-dd')   11.5 11.8 11.9 
0.308X 0.305X 0.304X
   from_unixtime(0)   2.26 2.35 2.39
0.0606X 0.061X0.0612X

After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.9   38 38.4
 1X 1X 1X
  cast(now() as string)   29.3 29.3 29.4 
0.773X  0.77X 0.767X
from_unixtime(0,'-MM-dd HH:mm:ss')   33.6 33.6 33.8 
0.888X 0.884X  0.88X
  from_unixtime(0,'-MM-dd')   49.9 49.9 50.3  
1.32X  1.31X  1.31X
   from_unixtime(0)   33.1 33.2 33.5 
0.875X 0.872X 0.873X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
13 files changed, 209 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public