Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 237: timezone_ = TimezoneDatabase::FindTimezone( > We should first check for TimezoneDatabase::IsTimestampDependentTimezone, o Done. Also added slot_desc_->type().type == TYPE_TIMESTAMP to the condition. Line 240: timezone_ = NULL; > This exception should not be swallowed. If we can not convert to the desire Added an error message. Even though the program execution should never get to L240, (we deal with invalid timezones in the frontend in the analyze phase), I feel it is important to be defensive here and display errors. Line 543: /// If not NULL, TIMESTAMP column readers require conversion to this time zone. > Although technically correct, this comment is a bit misleading. One may eas Done Line 593: if (LIKELY(dst->HasDateAndTime())) { > DCHECK(parent_->scan_node_->parquet_mr_time_zone() != "UTC") Done Line 597: dst->FromUtc(parent_->scan_node_->parquet_mr_time_zone()); > This is somewhat confusing, as it seems as if the first two cases did the s Done. I also refactored the if statements to make the logic in 'ConvertSlot()' easier to follow. Error handling was added as well. http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 189: time_zone_ptr TimezoneDatabase::FindTimezone(const string& tz) { > This should be a private helper function with two callers: 'FindTimezone(const string& tz)' is also called from the Frontend to check if a timezone string is valid (see 'Java_org_apache_impala_service_FeSupport_NativeCheckIsValidTimeZone()' in fe-support.cc). In this case it is not guaranteed that the timezone is not timestamp dependent. http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 109: // If a table has an invalid timezone, it should not have been loaded. > How does this comment apply here? Was it supposed to be before the catch bl Removed the comment as it doesn't make much sense here. Line 112: *this = ptime(not_a_date_time); > Again, swallowing this exception can be a problem as end-users may not noti Added code to display exception message. Added error-handling code to the caller function as well. http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 201: /// Converts from UTC to local time in the given timezone. > Please document the difference that one can be timestamp-dependent while th Both versions of 'FromUtc()' will work both with timestamp-independent and timestamp-dependent timezones. On the other hand, there is a difference between the two versions of 'TimezoneDatabase::FindTimezone()' which is documented in timezone_db.h http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 352: extern "C" > Please document the purpose of this call (allowing the Java code to check w Done http://gerrit.cloudera.org:8080/#/c/5939/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: Line 182: "'parquet.mr.int96.write.zone' is only supported for HDFS tables.")); > Why? Parquet files should be handled this way regardless of using HDFS or n I'm not 100% sure about this, but I think HdfsTable instances are used to represent tables stored in HDFS or S3 or the local file system. This condition was meant to exclude Kudu/HBase tables. Let's leave it like this for now and see what other reviewers think. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
