Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 237:         timezone_ = TimezoneDatabase::FindTimezone(
> We should first check for TimezoneDatabase::IsTimestampDependentTimezone, o
Done. Also added slot_desc_->type().type == TYPE_TIMESTAMP to the condition.


Line 240:         timezone_ = NULL;
> This exception should not be swallowed. If we can not convert to the desire
Added an error message. 

Even though the program execution should never get to L240, (we deal with 
invalid timezones in the frontend in the analyze phase), I feel it is important 
to be defensive here and display errors.


Line 543:   /// If not NULL, TIMESTAMP column readers require conversion to 
this time zone.
> Although technically correct, this comment is a bit misleading. One may eas
Done


Line 593:   if (LIKELY(dst->HasDateAndTime())) {
> DCHECK(parent_->scan_node_->parquet_mr_time_zone() != "UTC")
Done


Line 597:       dst->FromUtc(parent_->scan_node_->parquet_mr_time_zone());
> This is somewhat confusing, as it seems as if the first two cases did the s
Done. I also refactored the if statements to make the logic in 'ConvertSlot()' 
easier to follow. Error handling was added as well.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 189: time_zone_ptr TimezoneDatabase::FindTimezone(const string& tz) {
> This should be a private helper function with two callers:
'FindTimezone(const string& tz)' is also called from the Frontend to check if a 
timezone string is valid (see 
'Java_org_apache_impala_service_FeSupport_NativeCheckIsValidTimeZone()' in 
fe-support.cc). 

In this case it is not guaranteed that the timezone is not timestamp dependent.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 109:     // If a table has an invalid timezone, it should not have been 
loaded.
> How does this comment apply here? Was it supposed to be before the catch bl
Removed the comment as it doesn't make much sense here.


Line 112:     *this = ptime(not_a_date_time);
> Again, swallowing this exception can be a problem as end-users may not noti
Added code to display exception message. Added error-handling code to the 
caller function as well.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 201:   /// Converts from UTC to local time in the given timezone.
> Please document the difference that one can be timestamp-dependent while th
Both versions of 'FromUtc()' will work both with timestamp-independent and 
timestamp-dependent timezones.

On the other hand, there is a difference between the two versions of 
'TimezoneDatabase::FindTimezone()' which is documented in timezone_db.h


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 352: extern "C"
> Please document the purpose of this call (allowing the Java code to check w
Done


http://gerrit.cloudera.org:8080/#/c/5939/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 182:             "'parquet.mr.int96.write.zone' is only supported for HDFS 
tables."));
> Why? Parquet files should be handled this way regardless of using HDFS or n
I'm not 100% sure about this, but I think HdfsTable instances are used to 
represent tables stored in HDFS or S3 or the local file system. This condition 
was meant to exclude Kudu/HBase tables.

Let's leave it like this for now and see what other reviewers think.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to