Hello Attila Jeges, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11183
to look at the new patch set (#11).
Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue
conversions
......................................................................
IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to boost::posix_time::ptime)
by first splitting the input into seconds and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(), and
then adding the sub-seconds part to this timestamp.
Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
because CCTZ expects time points as seconds. These
were optimized by adding the subsecond part to time_
instead of adding it to a ptime. This can be done safely
because the sub-second part is between [0, 1 sec), so
it cannot overflow into a different day or timezone.
The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS
and TIMESTAMP_MICROS to the parquet scanner" - reading these types
will run micro/milli->TimestampValue conversion for every row.
Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
depending on flag use_local_tz_for_unix_timestamp_conversions
to be consistent with other similar functions. This function was
only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in convert-timestamp-benchmark.cc
it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
Note that only single threaded benchmarks were added because I
do not expect any difference in thw multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
ensure that time_ is between [0, 24 hour).
Testing:
- timestamp-test.cc was extended to give better coverage
for sub-second conversions. Edge cases were already
covered pretty well.
Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 369 insertions(+), 106 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/11
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>