Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
Patch Set 5: (26 comments) Thanks for the review. http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 305: // parquet-mr. > Comment that the FE guarantees that this is a valid timezone. Done http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 246: DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) || > simplify condition: Done Line 549: /// Used to cache the timezone object corresponding to "parquet.mr.int96.write.zone" > the "parquet.mr.int96.write.zone" table property Done Line 550: /// table property to avoid repeated calls to 'TimezoneDatabase::FindTimezone()'. > no need to single-quote here Done http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 190: // See if they specified a zone id > Who is "they"? Suggest rephrasing Done Line 202: return time_zone_ptr(); > return NULL? Done http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/BackendGflags.thrift File common/thrift/BackendGflags.thrift: Line 58: // If true, all newly created Parquet tables will have the parquet.mr.int96.write.zone > ... all newly created HDFS tables regardless of format ... Done http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 218: 9: optional string parquet_mr_write_zone; > remove trailing semicolon for consistency Done http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: Line 169: private static void analyzeParquetMrWriteZone(Table table, > These cases need tests in AnalyzeDDL Done. I also changed the exception message for consistency. http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 112: throw new AnalysisException("Invalid Time Zone: " + timezone); > Mention that the timezone is in the table property 'parquet.mr.int96.write. Done http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 183: if (getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE)) { > These cases need tests in AnalyzeDDL Done Line 186: "Only HDFS tables can use '%s' table property.", > the '%s' table property Done. I also changed the exception message for consistency. http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 54: import org.apache.impala.common.InternalException; > not needed Done http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py File tests/custom_cluster/test_parquet_timestamp_compatibility.py: Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite): > Test needs a comment. In particular, what is covered here and what is cover I've added a comment that describes tests in this file. What do you mean by "integration tests not part of Impala tests"? I don't think we have any integration tests upstream. Line 26: TEST_DB_NAME = 'timezone_test_db' > Why not use the unique_database fixture? That's the best practice. Switched to unique_database. Line 39: self.client = impalad.service.create_beeswax_client() > I think we already have a self.client from ImpalaTestSuite. Removed creating a client from here. Line 49: "/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet" > We need the appropriate filesystem prefix here. This will break on local fs Done Line 56: self.client.execute('USE ' + self.TEST_DB_NAME) > Why? Better to avoid changing the session state Removed it. Line 59: self.client.execute('INVALIDATE METADATA') > why? Removed it, it's not needed. Line 61: def _set_impala_table_timezone(self, timezone): > simplify to pass table name as param Done Line 71: @pytest.mark.execute_serially > How long does this test take? I'm thinking we should only have minimal test Moved tests that use startup options to ./tests/custom_cluster/test_hive_parquet_timestamp_conversion.py Line 81: statement = '''ALTER TABLE hive_table > Should be an analyzer test in AnalyzeDDL Removed it from here. Line 104: select_from_impala_table = '''SELECT timestamp_col FROM impala_table WHERE id = 1''' > Can you think of a way to validate all values in the table in bulk e.g. usi Done Line 113: self._set_impala_table_timezone('EST') > Add some comments about the behavior here. We are getting different values Done Line 141: statement = '''ALTER TABLE hive_table > analyzer test is more appropriate (also fix elsewhere) Done Line 185: def test_ddl(self, vector): > We need a test where Hive sets a garbage timezone and Impala throws an erro 1. Added 'test_garbage_parquet_mr_write_zone' to test how Impala handles invalid time zone values. It works for now, but please note that in the future Hive might not let users set 'parquet.mr.int96.write.zone' to an invalid time zone. 2. Added 'test_conversion_to_tbl_prop_timezone' to test_hive_parquet_timestamp_conversion.py to test the scenario when the table property is set and the convert_legacy_hive_parquet_utc_timestamps option is used. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
