Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15222


Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
......................................................................

WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert
sec + nano representation to Impala's TimestampValue (day + nano).
FromUnixTimeNanos is affected by flag
use_local_tz_for_unix_timestamp_conversions, while that global option
should not affect Orc. By default there is no conversion, but if the
flag is 1, then timestamps are interpreted as UTC and converted to
local time.

This could be solved by creating a UTC version of FromUnixTimeNanos,
but I decided to change the interface in the hope of meking To/From
timestamp functions less confusing.

Changes so far:
- Fixed the bug by passing UTC as timezone in the Orc scanner.
- Changed the interface of these TimestampValue functions to expect
  a timezone pointer, interpret null as UTC and skip conversion. It
  would be also possible to pass the actual UTC timezone and check
  for this in the functions, but I guess that it is easier to opimize
  the inlined functions this way.
- Changed RuntimeState and the Parquet scanner to skip timezone
  conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the
  timezone is UTC. This allows users to avoid the performance penalty
  of this flag by setting query option timezone to UTC in their
  session (IMPALA-7557). CCTZ doesn't is not good at this, actually
  conversions are slower with fixed offset timezones (including UTC)
  than timezones with DST/historical rule changes.

Further planned changes:
- Remove the UTC versions of the functions, as they shouldn't be
  faster than calling the converting functions with null after
  inlineing.
- Move out the checking of use_local_tz_for_unix_timestamp_conversions
  to RuntimeState. The plan is to create a function like
  Timezone* local_timezone_for_unix_time_conversions() and return
  null or the local timezone based on the flag. I think that this
  would make the intention of the callsites of From/To functions
  much clearer.
- Add/update comments.

Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
18 files changed, 111 insertions(+), 113 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/1
--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <[email protected]>

Reply via email to