Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 1:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34
PS1, Line 34: statup
> typo: startup
Done


http://gerrit.cloudera.org:8080/#/c/9986/1//COMMIT_MSG@34
PS1, Line 34: --hdfs_zoneinfo_dir
> Why call it hdfs_zone_dir when it can also refer to S3 and ADLS?
The name reflects that this flag should be used to specify a location in a 
"shared" filesystem that can be accessed through the HDFS API.

I think, naming the flag "--zoneinfo_dir" might be misleading as it should not 
be used to specify a directory in a local filesystem.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@1
PS1, Line 1: #include <chrono>
> Missing Apache header
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@38
PS1, Line 38: UtcToUnixTime:             Function  iters/ms   10%ile   50%ile   
90%ile     10%ile     50%ile     90%ile
> Do you think we should force the 90col limit on the following comment as we
Probably it's better to leave it like this. Some of the other benchmark 
programs have longer lines as well (e.g. 
./be/src/benchmarks/int-hash-benchmark.cc)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@134
PS1, Line 134: val
> I don't know what RAND_MAX is here, but I think that it can be 32K, which w
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@136
PS1, Line 136:     ss << to_simple_string(start);
> to_simple_string() return a string already, no need to use a stringstream f
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@166
PS1, Line 166:  d
> I am a bit concerned about writing to the same buffer from every thread - m
Fixed it. It required some major refactoring too.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@709
PS1, Line 709:     m1 = 
measure_multithreaded_elapsed_time(glibc_test_utc_to_unix, 
num_of_threads,BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@711
PS1, Line 711:     m2 = 
measure_multithreaded_elapsed_time(cctz_test_utc_to_unix, num_of_threads, 
BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@725
PS1, Line 725:     m1 = measure_multithreaded_elapsed_time(boost_test_from_utc, 
num_of_threads, BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@727
PS1, Line 727:     m2 = measure_multithreaded_elapsed_time(cctz_test_from_utc, 
num_of_threads, BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/benchmarks/convert-timestamp-benchmark.cc@743
PS1, Line 743:     m2 = 
measure_multithreaded_elapsed_time(cctz_test_utc_to_local, num_of_threads, 
BATCH_SIZE,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h
File be/src/exprs/decimal-operators.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/decimal-operators.h@168
PS1, Line 168:   /// instead of truncating if 'round' is true.
> Should we mention the new parameter in the comment?
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions-ir.cc@523
PS1, Line 523: /
> nit: missing spaces
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@90
PS1, Line 90: t.fractional_seconds()
> You have explained in person that 't' is used instead of 'to_cs'for sub-sec
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@105
PS1, Line 105:   if (timezone == nullptr) {
> This could be UNLIKELY.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@142
PS1, Line 142: t.fractional_seconds()
> Same as in line 90.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@28
PS1, Line 28: /// Functions to load and access the time-zone database.
> Please add some comments about thread-safety (e.g. "Initialize() should be
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160
PS1, Line 160: bool IsSymbolicLink(const string& path, string* real_path) {
> Maybe this could be moved to class FileSystemUtil.
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@412
PS1, Line 412:   char buffer[64*1024];
> I am a bit concerned about this - is it ok to keep buffers of this size on
Switched to using vector<char> instead.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@438
PS1, Line 438: // Load custom time-zone abbreviations from 'is' and add them to 
'tz_name_map_'.
> In most of Impala, the comments for private functions are in the .h file an
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@439
PS1, Line 439: void TimezoneDatabase::LoadZoneAbbreviations(istream &is,
> Are you sure that a corrupt abbreviation file is not an error? Maybe duplic
Agree, done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@440
PS1, Line 440: const char *path /* = nullptr */
> I did not find any caller that fills this argument. Please check if it can
'path' should be used when calling LoadZoneAbbreviations() in L434. Fixed it.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@460
PS1, Line 460: Skippng
> typo: Skippng is used consistently instead of Skipping - is this intentiona
It was a copy/paste typo.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@484
PS1, Line 484:       if (tz_name_map_.find(abbrev) != tz_name_map_.end()) {
> This could be checked before processing value and merged with the abbreviat
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321
PS1, Line 321: -
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/runtime-state.h@321
PS1, Line 321: global local
> I see the intent but "global local timezone" sounds strange :)
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.cc@172
PS1, Line 172:   auto from_tp = FromUnixSeconds(unix_time);
             :   auto to_cs = cctz::convert(from_tp, 
TimezoneDatabase::GetUtcTimezone());
             :   // boost::gregorian::date() throws boost::gregorian::bad_year 
if year is not in the
             :   // 1400..9999 range. Need to check validity before creating 
the date object.
             :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs)))) {
             :     return ptime(not_a_date_time);
             :   } else {
             :     return ptime(
             :         boost::gregorian::date(to_cs.year(), to_cs.month(), 
to_cs.day()),
             :         boost::posix_time::time_duration(to_cs.hour(), 
to_cs.minute(), to_cs.second()));
> This could be replaced by calling TimestampValue::UnixTimeToLocalPtime(unix
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/runtime/timestamp-value.inline.h@55
PS1, Line 55: inline bool TimestampValue::UtcToUnixTime(time_t* unix_time) 
const {
> Two opposing ideas:
Done


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc
File be/src/util/time-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/util/time-test.cc@60
PS1, Line 60:   EXPECT_EQ("1677-09-21 00:12:43.146",
            :       ToUtcStringFromUnixMillis(INT64_MIN / NANOS_PER_MICRO / 
MICROS_PER_MILLI));
            :   EXPECT_EQ("1677-09-21 00:12:43.145225",
            :       ToUtcStringFromUnixMicros(INT64_MIN / NANOS_PER_MICRO));
> Why have these times changed? Are these results "more correct" than the old
Actually, the old expected values were incorrect due to a bug in time.cc. 
Explanation:

1. In L58, INT64_MIN/NANOS_PER_SEC == -9223372036.
ToUtcStringFromUnix(-9223372036) == "1677-09-21 00:12:44"

2. In L60, INT64_MIN/NANOS_PER_MICRO/MICROS_PER_MILLI == -9223372036854.
-9223372036854 millisec == -9223372036*1000 millisec - 854 millisec.

Therefore ToUtcStringFromUnixMillis(-9223372036854) must correspond to a 
timestamp that is 854 millisec before "1677-09-21 00:12:44", that is 
"1677-09-21 00:12:43.146".

3. In L62, INT64_MIN/NANOS_PER_MICRO == -9223372036854775.
-9223372036854775 microsec == -9223372036*1000000 microsec - 854775 microsec.

Therefore ToUtcStringFromUnixMicros(-9223372036854775) must convert to a 
timestamp that is 854775 microsec before "1677-09-21 00:12:44", that is 
"1677-09-21 00:12:43.145225".


http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/9986/1/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@267
PS1, Line 267:     queryCtx.setLocal_time_zone("PST8PDT");
> Can you explain to what was changed here? Some tests ran differently depend
This commit adds 'local_time_zone' field to the query context 
(ImpalaInternalService.thrift). The BE expects this field to be set to a 
non-null value for each query.

We set the value here to a hard-coded value to make sure that it is non-null 
when the FE tests are running. The actual value doesn't matter as far as the FE 
tests are concerned.



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 18 Apr 2018 09:12:44 +0000
Gerrit-HasComments: Yes

Reply via email to