Alex Behm has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: Line 169: private static void analyzeParquetMrWriteZone(Table table, These cases need tests in AnalyzeDDL http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 183: if (getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE)) { These cases need tests in AnalyzeDDL http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 54: import org.apache.impala.common.InternalException; not needed http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py File tests/custom_cluster/test_parquet_timestamp_compatibility.py: Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite): Test needs a comment. In particular, what is covered here and what is covered in integration tests not part of Impala tests. Line 26: TEST_DB_NAME = 'timezone_test_db' Why not use the unique_database fixture? That's the best practice. Line 39: self.client = impalad.service.create_beeswax_client() I think we already have a self.client from ImpalaTestSuite. Line 49: "/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet" We need the appropriate filesystem prefix here. This will break on local fs or S3 builds. Take a look at: filesystem_utils.py get_fs_path() and how that is used elsewhere Line 56: self.client.execute('USE ' + self.TEST_DB_NAME) Why? Better to avoid changing the session state Line 59: self.client.execute('INVALIDATE METADATA') why? Line 61: def _set_impala_table_timezone(self, timezone): simplify to pass table name as param Line 71: @pytest.mark.execute_serially How long does this test take? I'm thinking we should only have minimal tests for the impalad startup option in a custom cluster test, probably in the existing test for convert_legacy_hive_parquet_utc_timestamps. The other tests should go into a regular test (subclass of ImpalaTestSuite) and run in parallel. Line 81: statement = '''ALTER TABLE hive_table Should be an analyzer test in AnalyzeDDL Line 104: select_from_impala_table = '''SELECT timestamp_col FROM impala_table WHERE id = 1''' Can you think of a way to validate all values in the table in bulk e.g. using our existing timestamp conversion functions? Having a few example values is still good, but we should also test that we are internally consistent with out conversion functions. Line 113: self._set_impala_table_timezone('EST') Add some comments about the behavior here. We are getting different values for the same timezone because the underlying Parquet files were written by Hive/Impala. Line 141: statement = '''ALTER TABLE hive_table analyzer test is more appropriate (also fix elsewhere) Line 185: def test_ddl(self, vector): We need a test where Hive sets a garbage timezone and Impala throws an error when analyzing a query against that table. We also need a test that shows what happens when a table property timezone is set and the convert_legacy_hive_parquet_utc_timestamps option is used. Probably good to consolidate with the existing custom cluster test. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
