Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 7:

(15 comments)

This is going to be a big improvement. Did a pass, mainly had comments about 
clarifying internal interfaces.

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/data-source-scan-node.h
File be/src/exec/data-source-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/data-source-scan-node.h@101
PS7, Line 101:   /// local time-zone for materializing 'TYPE_TIMESTAMP' slots.
Can local_tz be NULL? Maybe make it const& if not.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exec/parquet-column-readers.cc@603
PS7, Line 603:   if (dst_ts->HasDateAndTime()) 
dst_ts->UtcToLocal(parent_->state_->local_time_zone());
Would it make sense to cache the timezone locally in the ScalarColumnReader? 
That would save at least 2 pointer indirections per value, which could be 
meaningful in this part of the code.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/decimal-operators.h@172
PS7, Line 172:       const T& decimal_value, int scale, bool round, const 
Timezone* local_tz);
Is it nullable?


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/expr-test.cc@161
PS7, Line 161:   const Timezone *new_tz_;
nit: Timezone* new_tz_


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/exprs/timezone_db.cc@48
PS7, Line 48:     "HDFS/S3A/ADLS path to load IANA time-zone database from.");
nn]


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.h@317
PS7, Line 317:   /// Query-global timezone used as local timezone when 
executing the query.
Can this be NULL? Would be good to document. We should maybe return a const& 
above if it can't be NULL so that it's self-documenting.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc@131
PS7, Line 131: LIKELY
LIKELY won't make a measurable difference outside of perf-critical code, I find 
it adds noise in cases like this. The codebase isn't very consistent about it 
but I'm trying to stop the pattern from spreading :)


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/runtime-state.cc@134
PS7, Line 134:     LOG(ERROR) << "Failed to find local timezone " << 
query_ctx().local_time_zone
I think this should be a WARNING-level log. We should reserve ERROR for really 
severe errors, whereas this might flood logs.

I think we should also add a warning to the query warnings so that it's 
surfaced to the user, not just the admin.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.h@98
PS7, Line 98:   static TimestampValue FromUnixTime(time_t unix_time, const 
Timezone* local_tz) {
Here and below it isn't clear if local_tz is allowed to be NULL. If it can be 
NULL, can we extend comments to explain what happens if that case. If it can't, 
we could make it self-documenting by making it a const& instead of a const*.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/runtime/timestamp-value.cc@100
PS7, Line 100: CheckIfDateOutOfRange
Maybe IsDateOutOfRange(). With "Check" it isn't clear whether returning true 
means that it's out of range or if it means that the timestamp passed the check.


http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/7/be/src/util/time.h@122
PS7, Line 122: Timezone
Maybe const& if it's not allowed to be NULL.


http://gerrit.cloudera.org:8080/#/c/9986/7/cmake_modules/FindCctz.cmake
File cmake_modules/FindCctz.cmake:

http://gerrit.cloudera.org:8080/#/c/9986/7/cmake_modules/FindCctz.cmake@27
PS7, Line 27:   $ENV{IMPALA_HOME}/thirdparty/cctz-$ENV{IMPALA_CCTZ_VERSION}/src)
We can get rid of the thirdparty/ stuff. That's just left over from when Impala 
stored vendored versions of these dependencies in thirdpart/


http://gerrit.cloudera.org:8080/#/c/9986/7/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/9986/7/common/thrift/ImpalaInternalService.thrift@400
PS7, Line 400:   // String containing name of the local time zone.
Maybe mention whether it's been validated or not. E.g. can it be an arbitrary 
string, or is it guaranteed to be a valid timezone on the coordinator (but not 
necessarily on the executor, since in theory the executor could have a 
different timezone db).


http://gerrit.cloudera.org:8080/#/c/9986/7/testdata/tzdb/abbrev.conf
File testdata/tzdb/abbrev.conf:

PS7:
For all new files, you either need to add an Apache license header (preferable) 
or add it to bin/rat_exclude_files.txt (if the file format doesn't allow 
comments).

The precommit tests will check this, or you can run bin/check-rat-report.py 
(see the comment in that file).


http://gerrit.cloudera.org:8080/#/c/9986/7/tests/custom_cluster/test_custom_tzdb.py
File tests/custom_cluster/test_custom_tzdb.py:

http://gerrit.cloudera.org:8080/#/c/9986/7/tests/custom_cluster/test_custom_tzdb.py@25
PS7, Line 25: class TestCustomTimzoneDatabase(CustomClusterTestSuite):
nit:Timezone



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 16 May 2018 16:20:47 +0000
Gerrit-HasComments: Yes

Reply via email to