Alex Behm has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
......................................................................


Patch Set 6:

(10 comments)

I'm pretty happy with this change.

I think we should consider adding additional test cases for interesting 
boundary conditions, e.g., when there is ambiguity in the tz -> UTC conversion, 
but not in this patch.

http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 113:           "Invalid time zone in the the '%s' table property: %s",
double 'the'


http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 665:     // Attempt to set 'parquet.mr.int96.write.zone' table property. 
Positive case.
Let's move all the CREATE/ALTER tests into a separate 
TestParquetMrInt96WriteZone()

To me that org seems more natural.


Line 1882:         "\"/test-warehouse/alltypesagg_hive_13_1_parquet/" +
easier to read single quotes


Line 1904:         "\"/test-warehouse/alltypesagg_hive_13_1_parquet/" +
easier to read single quotes


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

Line 27:   '''Hive writes timestamps in Parquet files by first converting 
values from local time
Thank you! This comment is very informative and well written.


Line 105:     parquet_fn = get_fs_path(
What does "fn" stand for? I'm thinking "file name", but this is not just a file 
name.


Line 123:           i ON i.id = h.id AND i.day = h.day  -- serves as a unique 
key
easier to read with the alias 'i' next to the table


Line 125:           (h.timestamp_col IS NULL AND i.timestamp_col IS NOT NULL)
simplify the first two conditions with:

h.timestamp_col IS NULL != i.timestamp_col IS NULL

please apply the same changes to queries in:
test_parquet_timestamp_compatibility.py


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py
File tests/query_test/test_parquet_timestamp_compatibility.py:

Line 78:   def test_garbage_parquet_mr_write_zone(self, vector, 
unique_database):
test_invalid_parquet_mr_write_zone


Line 118:       # 'parquet.mr.int96.write.zone' table property to tz_name 
triggers  a 'UTC' ->
extra space after "triggers"


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to