Attila Jeges has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................

IMPALA-3307: Add support for IANA time-zone db

Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to
  specify an HDFS/S3/ADLS location that contains the shared compiled
  IANA time-zone database. If the startup flag is set, impalad will
  use the specified time-zone database. Otherwise, impalad will use
  the default /usr/share/zoneinfo time-zone database.
- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.
- Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad
  to specify an HDFS/S3/ADLS path to a shared config file that
  contains definitions for non-standard time-zone abbreviations.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/common/global-types.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
A be/src/exprs/timezone_db-test.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M be/src/util/time-test.cc
M be/src/util/time.cc
M be/src/util/time.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindCctz.cmake
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/create-load-data.sh
M testdata/data/timezoneverification.csv
A testdata/tzdb/abbrev.conf
A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA
A testdata/tzdb/zoneinfo/AmerICA/CancUN
A testdata/tzdb/zoneinfo/UTC
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
A tests/custom_cluster/test_custom_tzdb.py
53 files changed, 2,603 insertions(+), 1,095 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/8
--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to