Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )
Change subject: IMPALA-3307: Add support for IANA time-zone db ...................................................................... Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@172 PS2, Line 172: TestData This looks like a fairly general class to me that could move to util/benchmark.h or a similar file. We can create a follow up ticket if you don't want to deal with this now. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@444 PS2, Line 444: time_t utc = We could replace this with boost_utc_to_unix_time. This conversion should be fast, as we want to measure the speed of localtime_r, not the "glue" code. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/benchmarks/convert-timestamp-benchmark.cc@530 PS2, Line 530: // : // Test UnixTimeToUtcPtime (boost is expected to be faster than CCTZ) : // : : // boost : boost::pos I think that this a bit misleading, as boost_unix_time_to_utc_ptime never uses its slow branch with this test data. We could measure gmtime_r separately, and add some comments that tells which functions were used in Impala before/after the change. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h File be/src/exprs/decimal-operators.h: http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/decimal-operators.h@168 PS2, Line 168: /// local time in 'local_tz' time-zone). Rounds instead of truncating if 'round' is true. nit: long line http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timestamp-functions.cc@71 PS2, Line 71: const boost::gregorian::date& d = ts_value.date(); : const boost::posix_time::time_duration& t = ts_value.time(); : const cctz::civil_second from_cs(d.year(), d.month(), d.day(), t.hours(), t.minutes(), : t.seconds()); : : auto from_tp = cctz::convert(from_cs, TimezoneDatabase::GetUtcTimezone()); Does cctz offer a function to create timepoint from unix time_t? If yes, then it should be faster to convert ts_val to a time_t, and convert that to from_tp. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@43 PS2, Line 43: DEFINE_string(hdfs_zoneinfo_dir, "", : "HDFS/S3A/ADLS path to load IANA time-zone database from."); : DEFINE_string(hdfs_zoneabbrev_config, "", : "HDFS/S3A/ADLS path to config file defining non-standard time-zone abbreviations."); It will be a bit tricky, but these should be tested somehow. Playing with these configurations could go to a custom cluster test. These tests can be probably created at a later stage of the review, after the architecture have benn accepted. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/exprs/timezone_db.cc@419 PS2, Line 419: Status TimezoneDatabase::LoadZoneAbbreviations(istream &is, At least basic testing should be added to check that fix and non-fix abbreviations are parsed correctly, but it would be the best to also create tests for error messages. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.cc@135 PS2, Line 135: auto from_tp = FromUnixSeconds(unix_time); : auto to_cs = cctz::convert(from_tp, *local_tz This would be a big change, but I would think about "auto" types - do they make the code more readable, or not? I think that if type name is long and the reader more-or-less knows what kind of type to expect, then "auto" is very nice. But if the reader does not know context/library well, then "auto" makes it harder to look-up the classes. If you want to keep them, then I would prefer longer variable names, like spelling out civil_seconds. http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/9986/2/be/src/runtime/timestamp-value.inline.h@98 PS2, Line 98: if (UNLIKELY(!HasDateAndTime())) return false; : cctz::civil_second cs(date_.year(), date_.month(), date_.day(), time_.hours(), : time_.minutes(), time_.seconds()); : auto tp = cctz::convert(cs, : FLAGS_use_local_tz_for_unix_timestamp_conversions ? *local_tz : : TimezoneDatabase::GetUtcTimezone()); If FLAGS_use_local_tz_for_unix_timestamp_conversions is false, then we could call UtcToUnixTime. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc File be/src/util/time-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60 PS1, Line 60: EXPECT_EQ("1677-09-21 00:12:43.146", : ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / MICROS_PER_MILLI)); : EXPECT_EQ("1677-09-21 00:12:43.145225", : ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO)); > Actually, the old expected values were incorrect due to a bug in time.cc. E Thanks for the explanation! http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h File be/src/util/time.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time.h@30 PS1, Line 30: What do you think add about adding a typedef for cctz::timezone? I think that functions that just pass the cctz::time_zone* through do not have to know that we are using cctz. This would make it easier if we would like to wrap or replace it in the future. http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267 PS1, Line 267: queryCtx.setLocal_time_zone("PST8PDT"); > This commit adds 'local_time_zone' field to the query context (ImpalaIntern Thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 18 Apr 2018 22:08:40 +0000 Gerrit-HasComments: Yes
