Alex Behm has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
......................................................................


Patch Set 5:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 169:   private static void analyzeParquetMrWriteZone(Table table,
These cases need tests in AnalyzeDDL


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 183:     if 
(getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE)) {
These cases need tests in AnalyzeDDL


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 54: import org.apache.impala.common.InternalException;
not needed


http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py
File tests/custom_cluster/test_parquet_timestamp_compatibility.py:

Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite):
Test needs a comment. In particular, what is covered here and what is covered 
in integration tests not part of Impala tests.


Line 26:   TEST_DB_NAME = 'timezone_test_db'
Why not use the unique_database fixture? That's the best practice.


Line 39:     self.client = impalad.service.create_beeswax_client()
I think we already have a self.client from ImpalaTestSuite.


Line 49:         
"/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet"
We need the appropriate filesystem prefix here. This will break on local fs or 
S3 builds. Take a look at:

filesystem_utils.py get_fs_path()

and how that is used elsewhere


Line 56:     self.client.execute('USE ' + self.TEST_DB_NAME)
Why? Better to avoid changing the session state


Line 59:     self.client.execute('INVALIDATE METADATA')
why?


Line 61:   def _set_impala_table_timezone(self, timezone):
simplify to pass table name as param


Line 71:   @pytest.mark.execute_serially
How long does this test take? I'm thinking we should only have minimal tests 
for the impalad startup option in a custom cluster test, probably in the 
existing test for convert_legacy_hive_parquet_utc_timestamps. 

The other tests should go into a regular test (subclass of ImpalaTestSuite) and 
run in parallel.


Line 81:     statement = '''ALTER TABLE hive_table
Should be an analyzer test in AnalyzeDDL


Line 104:     select_from_impala_table = '''SELECT timestamp_col FROM 
impala_table WHERE id = 1'''
Can you think of a way to validate all values in the table in bulk e.g. using 
our existing timestamp conversion functions?
Having a few example values is still good, but we should also test that we are 
internally consistent with out conversion functions.


Line 113:     self._set_impala_table_timezone('EST')
Add some comments about the behavior here. We are getting different values for 
the same timezone because the underlying Parquet files were written by 
Hive/Impala.


Line 141:     statement = '''ALTER TABLE hive_table
analyzer test is more appropriate (also fix elsewhere)


Line 185:   def test_ddl(self, vector):
We need a test where Hive sets a garbage timezone and Impala throws an error 
when analyzing a query against that table.

We also need a test that shows what happens when a table property timezone is 
set and the convert_legacy_hive_parquet_utc_timestamps option is used. Probably 
good to consolidate with the existing custom cluster test.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to