Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12247 )
Change subject: IMPALA-5051: Add INT64 timestamp write support in Parquet ...................................................................... Patch Set 11: (19 comments) http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG@17 PS9, Line 17: INT96_NANOS > I think the NANOS suffix is somewhat misleading as it suggests that this is I am open to a new name, but I want to consult with Hive / Parquet people first. This is a development query option at the moment, so I think that it is not a big issue to change the names in the future. About discouraging people from using INT96_NANOS: currently this is the only timestamp representation that can be read by other Hadoop components. I would wait for a Parquet-MR release with logical type support + some cross-component testing / benchmarking before advising anyone to use the new types. http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG@42 PS9, Line 42: not > typo Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.h File be/src/exec/parquet/parquet-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.h@61 PS9, Line 61: that the inpu > nit: that the input type.. Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.cc@150 PS9, Line 150: parquet::TimeUnit time_unit; > nit: declare these before using them? Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@713 PS9, Line 713: // Test padding on double digits > I'd declare that when you use it Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@721 PS9, Line 721: ASSERT_TRUE(ParseFormatTokens(&dt_ctx)) << "TC: " << i; > Maybe prefix each of these comments with a comment or assert of the current I have rewritten min_date assignments to use explicit timestamps. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@737 PS9, Line 737: TimestampValue min_date = TimestampValue::Parse("1400-01-01"); > This test does not correspond to the comment above, does it? I.e., it is ro I think that "floor" in the name should make it clear that it rounds toward -. Note that ToUnixTime() also rounds towards -. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@740 PS9, Line 740: EQ(MI > The switch between us and ns confused me for a bit, not sure if there's a g I have rewritten the code to always reset min_date. I think that it is easier to understand now, but I can create new variables if you think that it would further improve readability. http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h@239 PS9, Line 239: 2262 > The famous Impala Year-2262 problem being conceived and I get to be part of :D http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h@375 PS9, Line 375: > nit: 1970-01-01 Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@98 PS9, Line 98: SECONDS_PER_ > While you're here, this could probably be a constant, too, given it's used Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@133 PS9, Line 133: return true; > Nit: I think some of these newlines could go. Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@156 PS9, Line 156: > nit: indent 4 spaces Done http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@160 PS9, Line 160: } > nit: If it doesn't fit on one line, it should have braces {} Done http://gerrit.cloudera.org:8080/#/c/12247/9/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/12247/9/tests/query_test/test_insert_parquet.py@848 PS9, Line 848: insert_stmt = """insert into {0} values > Can you add a test for the case that values don't fit into 64bit nanos? Done http://gerrit.cloudera.org:8080/#/c/12247/10/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/12247/10/tests/query_test/test_insert_parquet.py@824 PS10, Line 824: ("1969-12-31 23:59:59.999999999"), This was broken when rounding towards nearest was replaced with flooring. http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py File tests/util/get_parquet_metadata.py: http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py@180 PS9, Line 180: .path.no > Why is "normpath" needed here? It removes the trailing "/" http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py@180 PS9, Line 180: table_dir = os.path.join(tmp_dir, os.path.basename(os.path.normpath(hdfs_path))) > use os.path.join Done http://gerrit.cloudera.org:8080/#/c/12247/10/tests/util/get_parquet_metadata.py File tests/util/get_parquet_metadata.py: http://gerrit.cloudera.org:8080/#/c/12247/10/tests/util/get_parquet_metadata.py@180 PS10, Line 180: > flake8: E225 missing whitespace around operator Done -- To view, visit http://gerrit.cloudera.org:8080/12247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib41ad532ec902ed5a9a1528513726eac1c11441f Gerrit-Change-Number: 12247 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com> Gerrit-Comment-Date: Wed, 27 Mar 2019 14:58:30 +0000 Gerrit-HasComments: Yes