[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
> I am not sure what ISO means here exactly, but in the other constants it le
Right, this shouldn't be named ISO at all. I think I will change it to 
DEFAULT_SHORT_DATE_TIME_FRACTIONAL_FORMATTER_CTX.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81:
> Is this different than DEFAULT_DATE_TIME_CTX[0]?
The difference is in the cast direction. All previous tokenizer have PARSE 
direction. This one has FORMAT direction.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
> Is this different than DEFAULT_DATE_TIME_CTX[9]?
Difference is in the cast direction as well. This one has FORMAT direction.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 15:43:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(4 comments)

Thank you, Csaba, for the comments! Will address them in next patch set.

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49:  4.91 4.91 0.128X 0.128X 0.128X
: from_unixtime(0,'-MM-dd HH:mm:ss')
> Wonder why this became slow during the last patch.
Some unoptimal change in patch set 2. Might be the reinterpret_cast thing from 
uint8_t* to char* in TimestampValue::ToStringVal. Will take a look at it during 
next iteration.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341: tmp_buff
> I think that we can easily avoid using the tmp_buff here.
Good point. Having on-stack char array should also drop the necessity of 
tmp_string_ in FunctionContextImpl. Will try it in next patch set.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344
PS2, Line 344: for (int i = written_length; i < tok.len; i++) {
 :   *(dst + pos) = '0';
 :   pos++;
 : }
> If I understand this correctly, then we are writing to the end of the buffe
That is correct.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length 
exceeded!";
> I am a bit concerned that at the time we hit this DCHECK, we have already w
Will do.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 15:38:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49:  4.91 4.91 0.128X 0.128X 0.128X
: from_unixtime(0,'-MM-dd HH:mm:ss')
Wonder why this became slow during the last patch.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
I am not sure what ISO means here exactly, but in the other constants it leads 
to a "T" between the date and time part, while in your new format it is just 
related to the subseconds part.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81:
Is this different than DEFAULT_DATE_TIME_CTX[0]?


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
Is this different than DEFAULT_DATE_TIME_CTX[9]?


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341: tmp_buff
I think that we can easily avoid using the tmp_buff here.
As num_val is a non-negative int32, we could simply use an on-stack char array 
of size 10, and copy from there.

There is also an alternative to snprintf that we use in int->string casts, 
FastInt32ToBufferLeft() from gutil:
https://github.com/apache/impala/blob/master/be/src/gutil/strings/numbers.h#L171
We use it here: 
https://github.com/apache/impala/blob/master/be/src/exprs/cast-functions-ir.cc#L156

We could use FastInt32ToBufferLeft() to print to an array on stack, and then 
calculate the number of '0's to prepend from the result, and then copy it to 
dst.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344
PS2, Line 344: for (int i = written_length; i < tok.len; i++) {
 :   *(dst + pos) = '0';
 :   pos++;
 : }
If I understand this correctly, then we are writing to the end of the buffer 
here, while we used to write to the beginning of the tmp_str.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length 
exceeded!";
I am a bit concerned that at the time we hit this DCHECK, we have already 
written outside the buffer in the AppendToBuffer functions. It would be safer 
and probably not too slow to pass max_length as an argument to 
AppendToBuffer(s) and skip the copy there if it has overflown.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9685/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 04:12:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-27 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/cast-functions-ir.cc@195
PS1, Line 195:   StringVal sv =
> Can you make formatted_timestamp and sv member variables so the memory can
Refactored to not return string.

It looks like StringVal is only have len (int) and ptr (uint8_t*) in its struct 
member. The char buffer is not a member of StringVal, but a memory chunk 
managed by FunctionContext's MemPool. So I think it is OK to keep sv as it is, 
even if it is copied on return.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/cast-functions-ir.cc@200
PS1, Line 200:
> Try to avoid this copy on return by passing in an output arg.
Removed AnyValUtil::FromString.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/timestamp-functions-ir.cc@87
PS1, Line 87: const StringVal& fmt) {
> Try to eliminate copy on return.
Removed AnyValUtil::FromString.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc@208
PS1, Line 208:   } else {
> Try to eliminate copy on return.
This still copy on return in patch set 2, but only for usages outside of 
expression evaluation, such as logging.
I'm not sure if our toolchain compiler has Return Value Optimization (RVO).
Please let me know if I should return string reference instead.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 04:06:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-27 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#2).

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..

IMPALA-10984: Improve performance of TimestampValue::ToString

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation. A string field member,
tmp_string_, is added in FunctionContextImpl as a reusable buffer that
in each function evaluation to avoid repetitive temporary allocation.

This patch gives > 2X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.

Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.4 38.6 39.1
 1X 1X 1X
  cast(now() as string)2.2 2.28 2.31
0.0589X0.0591X0.0591X
from_unixtime(0,'-MM-dd HH:mm:ss')   5.94 6.18 6.23 
0.159X  0.16X 0.159X
  from_unixtime(0,'-MM-dd')   11.5 11.8 11.9 
0.308X 0.305X 0.304X
   from_unixtime(0)   2.26 2.35 2.39
0.0606X 0.061X0.0612X

After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   38.1 38.2 38.4
 1X 1X 1X
  cast(now() as string)   4.88 4.91 4.91 
0.128X 0.128X 0.128X
from_unixtime(0,'-MM-dd HH:mm:ss')   6.93 6.98 6.98 
0.182X 0.183X 0.182X
  from_unixtime(0,'-MM-dd')   12.2 12.2 12.3 
0.321X  0.32X 0.321X
   from_unixtime(0)   6.93 6.98 6.98 
0.182X 0.183X 0.182X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/udf/udf-internal.h
14 files changed, 243 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/2
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Sumi

[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/cast-functions-ir.cc@195
PS1, Line 195:   string formatted_timestamp =
Can you make formatted_timestamp and sv member variables so the memory can be 
reused?


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/cast-functions-ir.cc@200
PS1, Line 200:   return sv;
Try to avoid this copy on return by passing in an output arg.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/exprs/timestamp-functions-ir.cc@87
PS1, Line 87:   return AnyValUtil::FromString(context, formatted_timestamp);
Try to eliminate copy on return.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc@208
PS1, Line 208:   return Format(
Try to eliminate copy on return.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 27 Oct 2021 04:32:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9671/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 27 Oct 2021 00:22:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-26 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17980


Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..

IMPALA-10984: Improve performance of TimestampValue::ToString

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation.

This patch gives > 2X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.

Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.4 38.6 39.1
 1X 1X 1X
  cast(now() as string)2.2 2.28 2.31
0.0589X0.0591X0.0591X
from_unixtime(0,'-MM-dd HH:mm:ss')   5.94 6.18 6.23 
0.159X  0.16X 0.159X
  from_unixtime(0,'-MM-dd')   11.5 11.8 11.9 
0.308X 0.305X 0.304X
   from_unixtime(0)   2.26 2.35 2.39
0.0606X 0.061X0.0612X

After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.2 37.5 38.1
 1X 1X 1X
  cast(now() as string)   5.65 5.65 5.67 
0.152X  0.15X 0.149X
from_unixtime(0,'-MM-dd HH:mm:ss')6.3  6.3 6.37 
0.169X 0.168X 0.167X
  from_unixtime(0,'-MM-dd')   11.7 11.9   12 
0.315X 0.318X 0.314X
   from_unixtime(0)   6.23 6.23  6.3 
0.167X 0.166X 0.165X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-value.cc
6 files changed, 119 insertions(+), 86 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/1
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto