Attila Jeges has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/9986 )
Change subject: IMPALA-3307: Add support for IANA time-zone db ...................................................................... IMPALA-3307: Add support for IANA time-zone db Impala currently uses two different libraries for timestamp manipulations: boost and glibc. Issues with boost: - Time-zone database is currently hard coded in timezone_db.cc. Impala admins cannot update it without upgrading Impala. - Time-zone database is flat, therefore can’t track year-to-year changes. - Time-zone database is not updated on a regular basis. Issues with glibc: - Uses /usr/share/zoneinfo/ database which could be out of sync on some of the nodes in the Impala cluster. - Uses the host system’s local time-zone. Different nodes in the Impala cluster might use a different local time-zone. - Conversion functions take a global lock, which causes severe performance degradation. In addition to the issues above, the fact that /usr/share/zoneinfo/ and the hard-coded boost time-zone database are both in use is a source of inconsistency in itself. This patch makes the following changes: - Instead of boost and glibc, impalad uses Google's CCTZ to implement time-zone conversions. - Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to specify an HDFS/S3/ADLS path to a zip archive that contains the shared compiled IANA time-zone database. If the startup flag is set, impalad will use the specified time-zone database. Otherwise, impalad will use the default /usr/share/zoneinfo time-zone database. - Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to specify an HDFS/S3/ADLS path to a shared config file that contains definitions for non-standard time-zone aliases. - impalad reads the entire time-zone database into an in-memory map on startup for fast lookups. - The name of the coordinator node’s local time-zone is saved to the query context when preparing query execution. This time-zone is used whenever the current time-zone is referred afterwards in an execution node. - Adds a new ZipUtil class to extract files from a zip archive. The implementation is not vulnerable to Zip Slip. Cherry-picks: not for 2.x. Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 --- M CMakeLists.txt M be/CMakeLists.txt M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/common/global-types.h M be/src/common/init.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/data-source-scan-node.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M be/src/exprs/expr-test.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc A be/src/exprs/timezone_db-test.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h M be/src/runtime/raw-value-test.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/service/frontend.cc M be/src/service/impala-server.cc M be/src/service/impalad-main.cc M be/src/util/CMakeLists.txt M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/hdfs-util-test.cc M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M be/src/util/time-test.cc M be/src/util/time.cc M be/src/util/time.h A be/src/util/zip-util-test.cc A be/src/util/zip-util.cc A be/src/util/zip-util.h M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/rat_exclude_files.txt A cmake_modules/FindCctz.cmake M common/thrift/CMakeLists.txt M common/thrift/ImpalaInternalService.thrift A common/thrift/Zip.thrift M common/thrift/metrics.json A fe/src/main/java/org/apache/impala/util/ZipUtil.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/bin/create-load-data.sh M testdata/data/timezoneverification.csv A testdata/tzdb/2017c-corrupt.zip A testdata/tzdb/2017c.zip A testdata/tzdb/alias.conf A testdata/tzdb_tiny/America/New_York A testdata/tzdb_tiny/Etc/GMT+4 A testdata/tzdb_tiny/US/Eastern A testdata/tzdb_tiny/UTC A testdata/tzdb_tiny/Zulu A testdata/tzdb_tiny/posix/UTC A testdata/tzdb_tiny/posixrules M testdata/workloads/functional-query/queries/QueryTest/exprs.test M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py A tests/custom_cluster/test_shared_tzdb.py D tests/query_test/test_timezones.py 72 files changed, 3,089 insertions(+), 1,167 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/17 -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 17 Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>