Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )
Change subject: IMPALA-3307: Add support for IANA time-zone db ...................................................................... Patch Set 1: (31 comments) http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34 PS1, Line 34: statup > typo: startup Done http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34 PS1, Line 34: --hdfs_zoneinfo_dir > Why call it hdfs_zone_dir when it can also refer to S3 and ADLS? The name reflects that this flag should be used to specify a location in a "shared" filesystem that can be accessed through the HDFS API. I think, naming the flag "--zoneinfo_dir" might be misleading as it should not be used to specify a directory in a local filesystem. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@1 PS1, Line 1: #include <chrono> > Missing Apache header Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@38 PS1, Line 38: UtcToUnixTime: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile > Do you think we should force the 90col limit on the following comment as we Probably it's better to leave it like this. Some of the other benchmark programs have longer lines as well (e.g. ./be/src/benchmarks/int-hash-benchmark.cc) http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@134 PS1, Line 134: val > I don't know what RAND_MAX is here, but I think that it can be 32K, which w Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@136 PS1, Line 136: ss << to_simple_string(start); > to_simple_string() return a string already, no need to use a stringstream f Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@166 PS1, Line 166: d > I am a bit concerned about writing to the same buffer from every thread - m Fixed it. It required some major refactoring too. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@709 PS1, Line 709: m1 = measure_multithreaded_elapsed_time(glibc_test_utc_to_unix, num_of_threads,BATCH_SIZE, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@711 PS1, Line 711: m2 = measure_multithreaded_elapsed_time(cctz_test_utc_to_unix, num_of_threads, BATCH_SIZE, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@725 PS1, Line 725: m1 = measure_multithreaded_elapsed_time(boost_test_from_utc, num_of_threads, BATCH_SIZE, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@727 PS1, Line 727: m2 = measure_multithreaded_elapsed_time(cctz_test_from_utc, num_of_threads, BATCH_SIZE, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@743 PS1, Line 743: m2 = measure_multithreaded_elapsed_time(cctz_test_utc_to_local, num_of_threads, BATCH_SIZE, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h File be/src/exprs/decimal-operators.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h@168 PS1, Line 168: /// instead of truncating if 'round' is true. > Should we mention the new parameter in the comment? Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc@523 PS1, Line 523: / > nit: missing spaces Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@90 PS1, Line 90: t.fractional_seconds() > You have explained in person that 't' is used instead of 'to_cs'for sub-sec Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@105 PS1, Line 105: if (timezone == nullptr) { > This could be UNLIKELY. Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@142 PS1, Line 142: t.fractional_seconds() > Same as in line 90. Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@28 PS1, Line 28: /// Functions to load and access the time-zone database. > Please add some comments about thread-safety (e.g. "Initialize() should be Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160 PS1, Line 160: bool IsSymbolicLink(const string& path, string* real_path) { > Maybe this could be moved to class FileSystemUtil. Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@412 PS1, Line 412: char buffer[64*1024]; > I am a bit concerned about this - is it ok to keep buffers of this size on Switched to using vector<char> instead. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@438 PS1, Line 438: // Load custom time-zone abbreviations from 'is' and add them to 'tz_name_map_'. > In most of Impala, the comments for private functions are in the .h file an Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@439 PS1, Line 439: void TimezoneDatabase::LoadZoneAbbreviations(istream &is, > Are you sure that a corrupt abbreviation file is not an error? Maybe duplic Agree, done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@440 PS1, Line 440: const char *path /* = nullptr */ > I did not find any caller that fills this argument. Please check if it can 'path' should be used when calling LoadZoneAbbreviations() in L434. Fixed it. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@460 PS1, Line 460: Skippng > typo: Skippng is used consistently instead of Skipping - is this intentiona It was a copy/paste typo. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@484 PS1, Line 484: if (tz_name_map_.find(abbrev) != tz_name_map_.end()) { > This could be checked before processing value and merged with the abbreviat Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321 PS1, Line 321: - > nit: typo Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321 PS1, Line 321: global local > I see the intent but "global local timezone" sounds strange :) Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc@172 PS1, Line 172: auto from_tp = FromUnixSeconds(unix_time); : auto to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone()); : // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the : // 1400..9999 range. Need to check validity before creating the date object. : if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) { : return ptime(not_a_date_time); : } else { : return ptime( : boost::gregorian::date(to_cs.year(), to_cs.month(), to_cs.day()), : boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second())); > This could be replaced by calling TimestampValue::UnixTimeToLocalPtime(unix Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h@55 PS1, Line 55: inline bool TimestampValue::UtcToUnixTime(time_t* unix_time) const { > Two opposing ideas: Done http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc File be/src/util/time-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60 PS1, Line 60: EXPECT_EQ("1677-09-21 00:12:43.146", : ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / MICROS_PER_MILLI)); : EXPECT_EQ("1677-09-21 00:12:43.145225", : ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO)); > Why have these times changed? Are these results "more correct" than the old Actually, the old expected values were incorrect due to a bug in time.cc. Explanation: 1. In L58, INT64_MIN/NANOS_PER_SEC == -9223372036. ToUtcStringFromUnix(-9223372036) == "1677-09-21 00:12:44" 2. In L60, INT64_MIN/NANOS_PER_MICRO/MICROS_PER_MILLI == -9223372036854. -9223372036854 millisec == -9223372036*1000 millisec - 854 millisec. Therefore ToUtcStringFromUnixMillis(-9223372036854) must correspond to a timestamp that is 854 millisec before "1677-09-21 00:12:44", that is "1677-09-21 00:12:43.146". 3. In L62, INT64_MIN/NANOS_PER_MICRO == -9223372036854775. -9223372036854775 microsec == -9223372036*1000000 microsec - 854775 microsec. Therefore ToUtcStringFromUnixMicros(-9223372036854775) must convert to a timestamp that is 854775 microsec before "1677-09-21 00:12:44", that is "1677-09-21 00:12:43.145225". http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267 PS1, Line 267: queryCtx.setLocal_time_zone("PST8PDT"); > Can you explain to what was changed here? Some tests ran differently depend This commit adds 'local_time_zone' field to the query context (ImpalaInternalService.thrift). The BE expects this field to be set to a non-null value for each query. We set the value here to a hard-coded value to make sure that it is non-null when the FE tests are running. The actual value doesn't matter as far as the FE tests are concerned. -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 18 Apr 2018 09:12:44 +0000 Gerrit-HasComments: Yes
