Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11057 )
Change subject: IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet ...................................................................... Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/11057/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11057/15//COMMIT_MSG@10 PS15, Line 10: - parquet.thrift is updated to a newer version which contains the timestamp > nit: too long lines Done http://gerrit.cloudera.org:8080/#/c/11057/15/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/11057/15/be/src/exec/hdfs-parquet-scanner.h@654 PS15, Line 654: > Instead of having this function maybe it'd be better to extend ColumnStatsR Done http://gerrit.cloudera.org:8080/#/c/11057/16/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/11057/16/be/src/exec/hdfs-parquet-scanner.cc@518 PS16, Line 518: ip_row_group = true; : break; : } : : if (stats_read) { : TupleRow row; : row.SetTuple(0, min_max_tuple_); : i > For readability, I'd prefer this logic to be hidden in ReadFromThrift(). I see what you mean, but I did not want add TimestampHelper to the simple ReadTimestampStatFromThrift() call. I have removed the duplicated code on MIN/MAX to make it more readable. http://gerrit.cloudera.org:8080/#/c/11057/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/11057/4/be/src/exec/parquet-column-readers.cc@1686 PS4, Line 1686: return parent->obj_pool_.Add(reader); > Oops, this is still not done yet. I will return to this issue. This affects I have moved parent->obj_pool_.Add(); to the constructor of ParquetColumnReader, so now it always has an owner. http://gerrit.cloudera.org:8080/#/c/11057/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/11057/4/tests/query_test/test_scanners.py@759 PS4, Line 759: self.client.execute(("""CREATE TABLE {0}.int64_timestamps ( > It is now tested by using create_table_from_parquet. I did some back and forth here: I thought for some reason that CREATE TABLE LIKE PARQUET used to convert these columns to BIGINT, and changing this to TIMESTAMP would be breaking change, and I created IMPALA-7723 about this. Then I realized that CREATE TABLE LIKE PARQUET returned an error for the columns, so converting to TIMESTAMP cannot be a breaking change. -- To view, visit http://gerrit.cloudera.org:8080/11057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3 Gerrit-Change-Number: 11057 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 19 Oct 2018 17:05:03 +0000 Gerrit-HasComments: Yes
