Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12247 )
Change subject: IMPALA-5051: Add INT64 timestamp write support in Parquet ...................................................................... Patch Set 9: (17 comments) http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG@17 PS9, Line 17: INT96_NANOS I think the NANOS suffix is somewhat misleading as it suggests that this is just ns precision with a larger range. Since we also want to discourage use of this I suggest to name it INT96_DEPRECATED (or INT96_IMPALA since this is where it came from). http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG@42 PS9, Line 42: net typo http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.h File be/src/exec/parquet/parquet-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.h@61 PS9, Line 61: that the type nit: that the input type.. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.cc@150 PS9, Line 150: parquet::TimestampType timestamp_type; nit: declare these before using them? http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@713 PS9, Line 713: int64_t tm_min_micros, tm_min_millis; I'd declare that when you use it http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@721 PS9, Line 721: // Add 250ns and check the value is rounded down Maybe prefix each of these comments with a comment or assert of the current value of min_time. Otherwise one has to count how many times we added 250ns all the time. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@737 PS9, Line 737: EXPECT_TRUE(min_date.FloorUtcToUnixTimeMicros(&tm_min_micros)); This test does not correspond to the comment above, does it? I.e., it is rounding towards -? http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@740 PS9, Line 740: 250us The switch between us and ns confused me for a bit, not sure if there's a good way to make it stand out more. Also, some of these blocks add to the current value of min_date, while other blocks reset it. Maybe you can make more clear by not writing to it and using new variables instead (here and in the other blocks) http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h@239 PS9, Line 239: 2262 The famous Impala Year-2262 problem being conceived and I get to be part of it. :) http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h@375 PS9, Line 375: 1970.01.01 nit: 1970-01-01 http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@98 PS9, Line 98: 24 * 60 * 60 While you're here, this could probably be a constant, too, given it's used in other places. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@133 PS9, Line 133: Nit: I think some of these newlines could go. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@156 PS9, Line 156: static_cast<kudu::int128_t>(unixtime_seconds) * NANOS_PER_SEC nit: indent 4 spaces http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@160 PS9, Line 160: || nanos128 > std::numeric_limits<int64_t>::max()) return false; nit: If it doesn't fit on one line, it should have braces {} http://gerrit.cloudera.org:8080/#/c/12247/9/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/12247/9/tests/query_test/test_insert_parquet.py@848 PS9, Line 848: ColumnStats('ts', -1001001, 1001001, 0) Can you add a test for the case that values don't fit into 64bit nanos? http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py File tests/util/get_parquet_metadata.py: http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py@180 PS9, Line 180: normpath Why is "normpath" needed here? http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py@180 PS9, Line 180: table_dir = tmp_dir + "/" + os.path.basename(os.path.normpath(hdfs_path)) use os.path.join -- To view, visit http://gerrit.cloudera.org:8080/12247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib41ad532ec902ed5a9a1528513726eac1c11441f Gerrit-Change-Number: 12247 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-Comment-Date: Wed, 20 Mar 2019 17:40:25 +0000 Gerrit-HasComments: Yes
