Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15222
Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion ...................................................................... WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion Orc scanner uses TimestampValue::FromUnixTimeNanos() to convert sec + nano representation to Impala's TimestampValue (day + nano). FromUnixTimeNanos is affected by flag use_local_tz_for_unix_timestamp_conversions, while that global option should not affect Orc. By default there is no conversion, but if the flag is 1, then timestamps are interpreted as UTC and converted to local time. This could be solved by creating a UTC version of FromUnixTimeNanos, but I decided to change the interface in the hope of meking To/From timestamp functions less confusing. Changes so far: - Fixed the bug by passing UTC as timezone in the Orc scanner. - Changed the interface of these TimestampValue functions to expect a timezone pointer, interpret null as UTC and skip conversion. It would be also possible to pass the actual UTC timezone and check for this in the functions, but I guess that it is easier to opimize the inlined functions this way. - Changed RuntimeState and the Parquet scanner to skip timezone conversion if convert_legacy_hive_parquet_utc_timestamps=1 but the timezone is UTC. This allows users to avoid the performance penalty of this flag by setting query option timezone to UTC in their session (IMPALA-7557). CCTZ doesn't is not good at this, actually conversions are slower with fixed offset timezones (including UTC) than timezones with DST/historical rule changes. Further planned changes: - Remove the UTC versions of the functions, as they shouldn't be faster than calling the converting functions with null after inlineing. - Move out the checking of use_local_tz_for_unix_timestamp_conversions to RuntimeState. The plan is to create a function like Timezone* local_timezone_for_unix_time_conversions() and return null or the local timezone based on the flag. I think that this would make the intention of the callsites of From/To functions much clearer. - Add/update comments. Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/common/global-types.h M be/src/exec/data-source-scan-node.cc M be/src/exec/data-source-scan-node.h M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/parquet-common.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/raw-value-test.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 18 files changed, 111 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/15222/1 -- To view, visit http://gerrit.cloudera.org:8080/15222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e Gerrit-Change-Number: 15222 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer <[email protected]>
