Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 )
Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions ...................................................................... Patch Set 12: (15 comments) http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG@10 PS11, Line 10: (which is split do date_ and time_ similarly to > nit: please wrap at 70 characters, here and below. Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@578 PS11, Line 578: boost::posix_time::nanoseconds(t > It probably wouldn't make a difference but maybe you should call boost::pos Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@631 PS11, Line 631: cctz_optimized_unix_time_to_utc_ptime(unix_time_whole); > The original implementation had: That constant was not accessible from here and I thought that the results would be the same, so why not use NANOS_PER_SEC. It turned that I was wrong and the result is slightly different in some cases (+- 1 nanosec in analytic average tests). http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@635 PS11, Line 635: : TimestampValue from_subsecond_unix_time_new( : const double& unix_time) { : const double ONE_BILLIONTH = 0.000000001; : int64_t unix_time_whole = unix_time; : int64_t nanos = (unix_time - unix_time_whole) / ONE_BILLIONTH; : > The implementation of TimestampValue::FromSubsecondUnixTime() was changed i Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@923 PS11, Line 923: : // Benchmark FromSubsecondUnixTime before and after IMPALA-7417. > Are the rounding rules still different? No, it should be the same in patch set 12. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932 PS11, Line 932: > This benchmark and the previous bm_utc_from_unix_time_nanos benchmark measu I added the test originally to check if the rounding has added any regression. As the rounding was replaced with truncation this is no longer relevant. I would still keep the test because the results are different (5.57 vs 3.97 improvement) and the the benchmark turned out to be useful for catching correctness issues. So I would prefer to keep the test but I can remove it if you want to keep this file more compact. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@235 PS11, Line 235: The fraction part of 'nanos' can e > You mean "'nanos' can express sub-nanoseconds"? I hope that it is clearer with the new comment. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@240 PS11, Line 240: ue:: > Maybe use MILLIS_PER_SEC instead? My impression is that 1000 is more readable than MILLIS_PER_SEC, while NANOS_PER_SEC is more readable than 1000*1000*1000 (and 1000000000 is by far the worst), so this inconsistency is intentional. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@282 PS11, Line 282: p > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@280 PS11, Line 280: / Checks that all sub-second From*UnixTime gives the same result and that the result : // is the same as 'expected'. : // If 'expected' is nullptr then the result expected to be invalid (out of range). : void TestFromSubSecondFunctions(int64_t seconds, int millis, const cha > Why do we need this extra test? And how is this 'opposite sign'? Please cla I have extended the test to check the same timestamp with different seconds / millis, so that > 1000 millis are also tested. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@284 PS11, Line 284: Ti > nit: missing space after 'if' Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@848 PS11, Line 848: > How about adding some extra tests for the micros/nanos unix time conversion Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@189 PS11, Line 189: !time.is_negative() > Instead this, you could use !time.is_negative() Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@190 PS11, Line 190: time.total_nanoseconds() < NANOS_PER_DAY > maybe use (time.hours() < 24) ? hours() is a confusing function in my opinion - I think that it should be called total_hours(), because it can be >= 24. (seconds() and minutes() are < 60). The description at https://www.boost.org/doc/libs/1_55_0/doc/html/date_time/posix_time.html doesn't help too much to clear the confusion. I did a quick benchmark and it would have the same speed, (apparently gcc can replace A/ConstB < ConstC with A < ConstC*ConstB for integers), but I would prefer to keep it as it is. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@328 PS11, Line 328: unix_time > unix_time_ticks Done -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Tue, 28 Aug 2018 10:02:18 +0000 Gerrit-HasComments: Yes
