Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@172
PS2, Line 172: TestData
This looks like a fairly general class to me that could move to 
util/benchmark.h or a similar file. We can create a follow up ticket if you 
don't want to deal with this now.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@444
PS2, Line 444:     time_t utc =
We could replace this with boost_utc_to_unix_time. This conversion should be 
fast, as we want to measure the speed of localtime_r, not the "glue" code.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@530
PS2, Line 530: //
             : // Test UnixTimeToUtcPtime (boost is expected to be faster than 
CCTZ)
             : //
             :
             : // boost
             : boost::pos
I think that this a bit misleading, as boost_unix_time_to_utc_ptime never uses 
its slow branch with this test data. We could measure gmtime_r separately, and 
add some comments that tells which functions were used in Impala before/after 
the change.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h@168
PS2, Line 168:   /// local time in 'local_tz' time-zone). Rounds instead of 
truncating if 'round' is true.
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc@71
PS2, Line 71:   const boost::gregorian::date& d = ts_value.date();
            :   const boost::posix_time::time_duration& t = ts_value.time();
            :   const cctz::civil_second from_cs(d.year(), d.month(), d.day(), 
t.hours(), t.minutes(),
            :       t.seconds());
            :
            :   auto from_tp = cctz::convert(from_cs, 
TimezoneDatabase::GetUtcTimezone());
Does cctz offer a function to create timepoint from unix time_t? If yes, then 
it should be faster to convert ts_val to a time_t, and convert that to from_tp.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@43
PS2, Line 43: DEFINE_string(hdfs_zoneinfo_dir, "",
            :     "HDFS/S3A/ADLS path to load IANA time-zone database from.");
            : DEFINE_string(hdfs_zoneabbrev_config, "",
            :     "HDFS/S3A/ADLS path to config file defining non-standard 
time-zone abbreviations.");
It will be a bit tricky, but these should be tested somehow. Playing with these 
configurations could go to a custom cluster test. These tests can be probably 
created at a later stage of the review, after the architecture have benn 
accepted.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@419
PS2, Line 419: Status TimezoneDatabase::LoadZoneAbbreviations(istream &is,
At least basic testing should be added to check that fix and non-fix 
abbreviations are parsed correctly, but it would be the best to also create 
tests for error messages.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc@135
PS2, Line 135:   auto from_tp = FromUnixSeconds(unix_time);
             :   auto to_cs = cctz::convert(from_tp, *local_tz
This would be a big change, but I would think about "auto" types - do they make 
the code more readable, or not? I think that if type name is long and the 
reader more-or-less knows what kind of type to expect, then "auto" is very 
nice. But if the reader does not know context/library well, then "auto" makes 
it harder to look-up the classes. If you want to keep them, then I would prefer 
longer variable names, like spelling out civil_seconds.


http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h@98
PS2, Line 98:   if (UNLIKELY(!HasDateAndTime())) return false;
            :   cctz::civil_second cs(date_.year(), date_.month(), date_.day(), 
time_.hours(),
            :       time_.minutes(), time_.seconds());
            :   auto tp = cctz::convert(cs,
            :       FLAGS_use_local_tz_for_unix_timestamp_conversions ? 
*local_tz :
            :       TimezoneDatabase::GetUtcTimezone());
If FLAGS_use_local_tz_for_unix_timestamp_conversions is false, then we could 
call UtcToUnixTime.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc
File be/src/util/time-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60
PS1, Line 60:   EXPECT_EQ("1677-09-21 00:12:43.146",
            :       ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / 
MICROS_PER_MILLI));
            :   EXPECT_EQ("1677-09-21 00:12:43.145225",
            :       ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO));
> Actually, the old expected values were incorrect due to a bug in time.cc. E
Thanks for the explanation!


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h@30
PS1, Line 30:
What do you think add about adding a typedef for cctz::timezone? I think that 
functions that just pass the cctz::time_zone* through do not have to know that 
we are using cctz. This would make it easier if we would like to wrap or 
replace it in the future.


http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267
PS1, Line 267:     queryCtx.setLocal_time_zone("PST8PDT");
> This commit adds 'local_time_zone' field to the query context (ImpalaIntern
Thanks for the explanation!



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 18 Apr 2018 22:08:40 +0000
Gerrit-HasComments: Yes

Reply via email to