Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )
Change subject: IMPALA-3307: Add support for IANA time-zone db ...................................................................... Patch Set 7: (15 comments) This is going to be a big improvement. Did a pass, mainly had comments about clarifying internal interfaces. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/data-source-scan-node.h File be/src/exec/data-source-scan-node.h: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/data-source-scan-node.h@101 PS7, Line 101: /// local time-zone for materializing 'TYPE_TIMESTAMP' slots. Can local_tz be NULL? Maybe make it const& if not. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/parquet-column-readers.cc@603 PS7, Line 603: if (dst_ts->HasDateAndTime()) dst_ts->UtcToLocal(parent_->state_->local_time_zone()); Would it make sense to cache the timezone locally in the ScalarColumnReader? That would save at least 2 pointer indirections per value, which could be meaningful in this part of the code. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/decimal-operators.h File be/src/exprs/decimal-operators.h: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/decimal-operators.h@172 PS7, Line 172: const T& decimal_value, int scale, bool round, const Timezone* local_tz); Is it nullable? http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/expr-test.cc@161 PS7, Line 161: const Timezone *new_tz_; nit: Timezone* new_tz_ http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db.cc@48 PS7, Line 48: "HDFS/S3A/ADLS path to load IANA time-zone database from."); nn] http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.h@317 PS7, Line 317: /// Query-global timezone used as local timezone when executing the query. Can this be NULL? Would be good to document. We should maybe return a const& above if it can't be NULL so that it's self-documenting. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc@131 PS7, Line 131: LIKELY LIKELY won't make a measurable difference outside of perf-critical code, I find it adds noise in cases like this. The codebase isn't very consistent about it but I'm trying to stop the pattern from spreading :) http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc@134 PS7, Line 134: LOG(ERROR) << "Failed to find local timezone " << query_ctx().local_time_zone I think this should be a WARNING-level log. We should reserve ERROR for really severe errors, whereas this might flood logs. I think we should also add a warning to the query warnings so that it's surfaced to the user, not just the admin. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.h@98 PS7, Line 98: static TimestampValue FromUnixTime(time_t unix_time, const Timezone* local_tz) { Here and below it isn't clear if local_tz is allowed to be NULL. If it can be NULL, can we extend comments to explain what happens if that case. If it can't, we could make it self-documenting by making it a const& instead of a const*. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.cc@100 PS7, Line 100: CheckIfDateOutOfRange Maybe IsDateOutOfRange(). With "Check" it isn't clear whether returning true means that it's out of range or if it means that the timestamp passed the check. http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/util/time.h File be/src/util/time.h: http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/util/time.h@122 PS7, Line 122: Timezone Maybe const& if it's not allowed to be NULL. http://gerrit.cloudera.org:8080/#/c/9986/7/cmake_modules/FindCctz.cmake File cmake_modules/FindCctz.cmake: http://gerrit.cloudera.org:8080/#/c/9986/7/cmake_modules/FindCctz.cmake@27 PS7, Line 27: $ENV{IMPALA_HOME}/thirdparty/cctz-$ENV{IMPALA_CCTZ_VERSION}/src) We can get rid of the thirdparty/ stuff. That's just left over from when Impala stored vendored versions of these dependencies in thirdpart/ http://gerrit.cloudera.org:8080/#/c/9986/7/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/9986/7/common/thrift/ImpalaInternalService.thrift@400 PS7, Line 400: // String containing name of the local time zone. Maybe mention whether it's been validated or not. E.g. can it be an arbitrary string, or is it guaranteed to be a valid timezone on the coordinator (but not necessarily on the executor, since in theory the executor could have a different timezone db). http://gerrit.cloudera.org:8080/#/c/9986/7/testdata/tzdb/abbrev.conf File testdata/tzdb/abbrev.conf: PS7: For all new files, you either need to add an Apache license header (preferable) or add it to bin/rat_exclude_files.txt (if the file format doesn't allow comments). The precommit tests will check this, or you can run bin/check-rat-report.py (see the comment in that file). http://gerrit.cloudera.org:8080/#/c/9986/7/tests/custom_cluster/test_custom_tzdb.py File tests/custom_cluster/test_custom_tzdb.py: http://gerrit.cloudera.org:8080/#/c/9986/7/tests/custom_cluster/test_custom_tzdb.py@25 PS7, Line 25: class TestCustomTimzoneDatabase(CustomClusterTestSuite): nit:Timezone -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 7 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 16 May 2018 16:20:47 +0000 Gerrit-HasComments: Yes
