Hello Quanlong Huang, Attila Jeges, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15222
to look at the new patch set (#2).
Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................
WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos is affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect Orc. By default there is no conversion, but if the
flag is 1, then timestamps are interpreted as UTC and converted to
local time.
This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of making To/From
timestamp functions less confusing.
Changes so far:
- Fixed the bug by passing UTC as timezone in the Orc scanner.
- Changed the interface of these TimestampValue functions to expect
a timezone pointer, interpret null as UTC and skip conversion. It
would be also possible to pass the actual UTC timezone and check
for this in the functions, but I guess that it is easier to optimize
the inlined functions this way.
- Changed RuntimeState and the Parquet scanner to skip timezone
conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
timezone is UTC. This allows users to avoid the performance penalty
of this flag by setting query option timezone to UTC in their
session (IMPALA-7557). CCTZ is not good at this, actually
conversions are slower with fixed offset timezones (including UTC)
than with timezones that have DST/historical rule changes.
Further planned changes:
- Remove the UTC versions of the functions, as they shouldn't be
faster than calling the converting functions with null after
inlining.
- Move out the checking of use_local_tz_for_unix_timestamp_conversions
to RuntimeState. The plan is to create a function like
Timezone* local_timezone_for_unix_time_conversions() and return
null or the local timezone based on the flag. I think that this
would make the intention of the callsites of From/To functions
much clearer.
- Add/update comments.
Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
18 files changed, 111 insertions(+), 113 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/2
--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>