Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12247 )

Change subject: IMPALA-5051: Add INT64 timestamp write support in Parquet
......................................................................


Patch Set 11:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG@17
PS9, Line 17: INT96_NANOS
> I think the NANOS suffix is somewhat misleading as it suggests that this is
I am open to a new name, but I want to consult with Hive / Parquet people 
first. This is a development query option at the moment, so I think that it is 
not a big issue to change the names in the future.

About discouraging people from using INT96_NANOS: currently this is the only 
timestamp representation that can be read by other Hadoop components. I would 
wait for a Parquet-MR release with logical type support + some cross-component 
testing / benchmarking before advising anyone to use the new types.


http://gerrit.cloudera.org:8080/#/c/12247/9//COMMIT_MSG@42
PS9, Line 42: not
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.h
File be/src/exec/parquet/parquet-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.h@61
PS9, Line 61: that the inpu
> nit: that the input type..
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/exec/parquet/parquet-metadata-utils.cc@150
PS9, Line 150:   parquet::TimeUnit time_unit;
> nit: declare these before using them?
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@713
PS9, Line 713:     // Test padding on double digits
> I'd declare that when you use it
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@721
PS9, Line 721:     ASSERT_TRUE(ParseFormatTokens(&dt_ctx))  << "TC: " << i;
> Maybe prefix each of these comments with a comment or assert of the current
I have rewritten min_date assignments to use explicit timestamps.


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@737
PS9, Line 737:   TimestampValue min_date = TimestampValue::Parse("1400-01-01");
> This test does not correspond to the comment above, does it? I.e., it is ro
I think that "floor" in the name should make it clear that it rounds toward -. 
Note that ToUnixTime() also rounds towards -.


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-test.cc@740
PS9, Line 740: EQ(MI
> The switch between us and ns confused me for a bit, not sure if there's a g
I have rewritten the code to always reset min_date. I think that it is easier 
to understand now, but I can create new variables if you think that it would 
further improve readability.


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h@239
PS9, Line 239: 2262
> The famous Impala Year-2262 problem being conceived and I get to be part of
:D


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.h@375
PS9, Line 375:
> nit: 1970-01-01
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@98
PS9, Line 98: SECONDS_PER_
> While you're here, this could probably be a constant, too, given it's used
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@133
PS9, Line 133:   return true;
> Nit: I think some of these newlines could go.
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@156
PS9, Line 156:
> nit: indent 4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/be/src/runtime/timestamp-value.inline.h@160
PS9, Line 160:   }
> nit: If it doesn't fit on one line, it should have braces {}
Done


http://gerrit.cloudera.org:8080/#/c/12247/9/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/12247/9/tests/query_test/test_insert_parquet.py@848
PS9, Line 848:     insert_stmt = """insert into {0} values
> Can you add a test for the case that values don't fit into 64bit nanos?
Done


http://gerrit.cloudera.org:8080/#/c/12247/10/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/12247/10/tests/query_test/test_insert_parquet.py@824
PS10, Line 824:         ("1969-12-31 23:59:59.999999999"),
This was broken when rounding towards nearest was replaced with flooring.


http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py
File tests/util/get_parquet_metadata.py:

http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py@180
PS9, Line 180: .path.no
> Why is "normpath" needed here?
It removes the trailing "/"


http://gerrit.cloudera.org:8080/#/c/12247/9/tests/util/get_parquet_metadata.py@180
PS9, Line 180:   table_dir = os.path.join(tmp_dir, 
os.path.basename(os.path.normpath(hdfs_path)))
> use os.path.join
Done


http://gerrit.cloudera.org:8080/#/c/12247/10/tests/util/get_parquet_metadata.py
File tests/util/get_parquet_metadata.py:

http://gerrit.cloudera.org:8080/#/c/12247/10/tests/util/get_parquet_metadata.py@180
PS10, Line 180:
> flake8: E225 missing whitespace around operator
Done



--
To view, visit http://gerrit.cloudera.org:8080/12247
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib41ad532ec902ed5a9a1528513726eac1c11441f
Gerrit-Change-Number: 12247
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 14:58:30 +0000
Gerrit-HasComments: Yes

Reply via email to