Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
Patch Set 8: (20 comments) http://gerrit.cloudera.org:8080/#/c/5939/7//COMMIT_MSG Commit Message: PS7, Line 22: New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE : LIKE <file> will set the table property to UTC if the global flag : --set_parque > I don't follow this given the next two paragraphs. This isn't true for the Correct. Changed the commit message. http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 305: // parquet-mr. If conversion should not occur, this is set to an empty string. Otherwise > from the code, it looks like this is set to the empty string if conversion Correct, added a comment to document the behavior. http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 234: FLAGS_convert_legacy_hive_parquet_utc_timestamps > given that parquet_mr_write_zone() will take precedence over the flag, I th Done PS7, Line 246: DCHECK > nit: DCHECK_EQ() Done PS7, Line 609: status = scanner_state->LogOrReturnError(msg); > since this code is very performance critical, it would be best to order thi Correct. Reordered the conditions in the if-elif-else statement PS7, Line 611: status = status; > these could all be UNLIKELY since we want to optimize for the non-error cas Done PS7, Line 625: LY(dst->HasDateAndTime() > i.e. here too and below. Done PS7, Line 639: if (!SetTimestampConversionError(parent_->scan_node_, parent_->state_, : parent_->parse_status_, src, parent_->scan_node_->parquet_mr_write_zone())) { : return false; : } : } : } else { : DCHECK(parent_->scan_node_->parquet_mr_write_zone().empty()); : DCH > seems worth factoring this block (which occurs 3 times) into a non-inlined Done. Added a new non-class member function and factored out the block into it. (Adding a new member function instead to ScalarColumnReader template class is probably not a good idea since it is only needed in the ScalarColumnReader<TimestampValue, true> instance) http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 193: > could you add a comment explaining these two lookup techniques? Done http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: PS7, Line 45: hen IsTimestampDependen > how about drawing the connection with the next function directly: Done Line 49: > comment this. Done http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 101: Status s("Failed to convert timestamp to local time."); > this might produce a ton of logging. instead, why not return the Status ob Done Line 109: bool TimestampValue::FromUtc(const std::string& timezone_str) { > how do we guarantee that this is true? did the caller already check that? Yes, it did. PS7, Line 111: _zone_ptr timezo > UNLIKELY (since perf critical) Done Line 123: ToPtime(&temp); > see the comment at line 77. i think we were avoiding this for perf reasons. I've added a micro-benchmark to compare the two. UtcToLocal() is about 2.5 times faster than FromUtc() when input data size is 1,000 and 1.5 times faster when input data size is 10,000. Rewriting FromUtc() not to use boost function calls is not that straightforward. The above version of UtcToLocal() uses localtime_r() standard C function to perform the UTC->local timezone conversion. FromUtc() on the other hand should convert timestamps to a given time zone. I don't think there is a standard C function to do this. http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS7, Line 197: > below you say "local time in the given timezone", which I guess is technica Changed the comment below for 'FromUtc' PS7, Line 203: > it's not clear that "convert" means that *this is modified in-place, so I t Done http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS7, Line 122: new > is that always the case? Fixed the description PS7, Line 122: bles created " : "using CREATE TABLE and CREATE TABLE LIKE <file>. You can find details in the " : "documentation."); : : DEFINE_int64(max_result_cache_size, 10000 > I'm not sure that this makes sense on it's own. I think you kind of have to Removed that part http://gerrit.cloudera.org:8080/#/c/5939/7/common/thrift/BackendGflags.thrift File common/thrift/BackendGflags.thrift: PS7, Line 58: new > is that true? what about "LIKE table" case? Fixed the comment -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com> Gerrit-HasComments: Yes