Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 304: // Time zone for adjusting timestamp values read from Parquet files written by > single line That would make the line 92 characters long. http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 246: DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) || > Even better, we can check this at the scan node level, e.g., in HdfsScanNod Simplified the code. The timezone check is done in the FE, the BE is guaranteed to have a valid timezone string. Line 653: } > As mentioned above, we should terminate the query early when in such a bad Added DCHECK(false) http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 194: for (const string& tz_region: tz_region_list_) { > No need to use auto here. const string& is clearer Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: Line 40: /// If 'tz' is not found in the database, nullptr is returned. > null pointer -> nullptr Done Line 46: /// If 'tz' is not found in the database, nullptr is returned. > null pointer -> nullptr Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 199: /// conversion was successfull and false otherwise. If conversion failed *this is set to > Comment on what the state of *this is when false is returned. Done Line 202: > Comment on what the state of *this is when false is returned. Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 357: const char *tz = env->GetStringUTFChars(timezone, NULL); > const char* Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 122: "the parquet.mr.int96.write.zone table property to UTC for new tables. This changes " > consistent placing of space (at line end or at line beginning) Done http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 67: analyzeJoin(analyzer); > If we do the validation here, then there is no need to check in each HdfsSc Done http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 184: if (getFileFormat() == THdfsFileFormat.KUDU) { > You can check getFileFormat() here. The only non-HDFS formats we have are H Done Line 189: String timezone = getTblProperties().get(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE); > We should only do this for HDFS tables right? Done http://gerrit.cloudera.org:8080/#/c/5939/4/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: Line 87: // Returns true if timezone String is valid according to the BE timezone database, false > valid according to the BE timezone database Done http://gerrit.cloudera.org:8080/#/c/5939/4/tests/custom_cluster/test_parquet_timestamp_compatibility.py File tests/custom_cluster/test_parquet_timestamp_compatibility.py: Line 1: # Licensed to the Apache Software Foundation (ASF) under one > Use new Apache license header. Do we still have the old header in some exis My bad, I created this test from an old test script. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
