Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11057 )

Change subject: IMPALA-5050: Add support to read TIMESTAMP_MILLIS and 
TIMESTAMP_MICROS from Parquet
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11057/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11057/15//COMMIT_MSG@10
PS15, Line 10: - parquet.thrift is updated to a newer version which contains 
the timestamp
> nit: too long lines
Done


http://gerrit.cloudera.org:8080/#/c/11057/15/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/11057/15/be/src/exec/hdfs-parquet-scanner.h@654
PS15, Line 654:
> Instead of having this function maybe it'd be better to extend ColumnStatsR
Done


http://gerrit.cloudera.org:8080/#/c/11057/16/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/11057/16/be/src/exec/hdfs-parquet-scanner.cc@518
PS16, Line 518: ip_row_group = true;
              :       break;
              :     }
              :
              :     if (stats_read) {
              :       TupleRow row;
              :       row.SetTuple(0, min_max_tuple_);
              :       i
> For readability, I'd prefer this logic to be hidden in ReadFromThrift().
I see what you mean, but I did not want add TimestampHelper to the simple 
ReadTimestampStatFromThrift() call. I have removed the duplicated code on 
MIN/MAX to make it more readable.


http://gerrit.cloudera.org:8080/#/c/11057/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/11057/4/be/src/exec/parquet-column-readers.cc@1686
PS4, Line 1686:   return parent->obj_pool_.Add(reader);
> Oops, this is still not done yet. I will return to this issue. This affects
I have moved parent->obj_pool_.Add(); to the constructor of 
ParquetColumnReader, so now it always has an owner.


http://gerrit.cloudera.org:8080/#/c/11057/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/11057/4/tests/query_test/test_scanners.py@759
PS4, Line 759:     self.client.execute(("""CREATE TABLE {0}.int64_timestamps (
> It is now tested by using create_table_from_parquet.
I did some back and forth here:
I thought for some reason that CREATE TABLE LIKE PARQUET used to convert these 
columns to BIGINT, and changing this to TIMESTAMP would be breaking change, and 
I created IMPALA-7723 about this.
Then I realized that CREATE TABLE LIKE PARQUET returned an error for the 
columns, so converting to TIMESTAMP cannot be a breaking change.



--
To view, visit http://gerrit.cloudera.org:8080/11057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
Gerrit-Change-Number: 11057
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 19 Oct 2018 17:05:03 +0000
Gerrit-HasComments: Yes

Reply via email to