Alex Behm has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
Patch Set 6: (10 comments) I'm pretty happy with this change. I think we should consider adding additional test cases for interesting boundary conditions, e.g., when there is ambiguity in the tz -> UTC conversion, but not in this patch. http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 113: "Invalid time zone in the the '%s' table property: %s", double 'the' http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 665: // Attempt to set 'parquet.mr.int96.write.zone' table property. Positive case. Let's move all the CREATE/ALTER tests into a separate TestParquetMrInt96WriteZone() To me that org seems more natural. Line 1882: "\"/test-warehouse/alltypesagg_hive_13_1_parquet/" + easier to read single quotes Line 1904: "\"/test-warehouse/alltypesagg_hive_13_1_parquet/" + easier to read single quotes http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py: Line 27: '''Hive writes timestamps in Parquet files by first converting values from local time Thank you! This comment is very informative and well written. Line 105: parquet_fn = get_fs_path( What does "fn" stand for? I'm thinking "file name", but this is not just a file name. Line 123: i ON i.id = h.id AND i.day = h.day -- serves as a unique key easier to read with the alias 'i' next to the table Line 125: (h.timestamp_col IS NULL AND i.timestamp_col IS NOT NULL) simplify the first two conditions with: h.timestamp_col IS NULL != i.timestamp_col IS NULL please apply the same changes to queries in: test_parquet_timestamp_compatibility.py http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py File tests/query_test/test_parquet_timestamp_compatibility.py: Line 78: def test_garbage_parquet_mr_write_zone(self, vector, unique_database): test_invalid_parquet_mr_write_zone Line 118: # 'parquet.mr.int96.write.zone' table property to tz_name triggers a 'UTC' -> extra space after "triggers" -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
