Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 )
Change subject: IMPALA-7368: Add initial support for DATE type ...................................................................... Patch Set 3: (4 comments) I ran through the non-test cpp code. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exec/hdfs-scan-node-base.cc@644 PS3, Line 644: // TODO: Remove this block once DATE type is supported accross all file formats. : if (has_materialized_date_slot_ && partition->file_format() != THdfsFileFormat::TEXT) { : context->ClearStreams(); : return Status("DATE type is only supported with TEXT file format."); : } I think that this logic should be implemented in frontend, and the backend should only contains DCHECKs. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc@310 PS3, Line 310: int year, month, day; : if (UNLIKELY(!dv.ToYearMonthDay(&year, &month, &day))) return TimestampVal::null(); : if (year < 1400 || year > 9999) return TimestampVal::null(); : : const boost::gregorian::date d(year, month, day); : const boost::posix_time::time_duration t(0, 0, 0, 0); : TimestampValue tv(d, t); performance: this could be done much faster, as both DateVal and TimestampValue's date_ are "number of days since some epoch", so the conversion should be simply checking a range + add the difference between the to epoch. http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/cast-functions-ir.cc@330 PS3, Line 330: if (val.is_null) return DateVal::null(); : TimestampValue tv = TimestampValue::FromTimestampVal(val); : if (UNLIKELY(!tv.HasDate())) return DateVal::null(); : : const boost::gregorian::date d = tv.date(); : const DateValue dv(d.year(), d.month(), d.day()); : return dv.ToDateVal(); same as in line 310: this could be done with a simple addition. There is a function that does exactly this conversion in https://gerrit.cloudera.org/#/c/12247/8/be/src/runtime/timestamp-value.inline.h ( int64_t TimestampValue::DaysSinceUnixEpoch() ) http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc: http://gerrit.cloudera.org:8080/#/c/12481/3/be/src/exprs/slot-ref.cc@485 PS3, Line 485: const DateValue dv(*reinterpret_cast<int32_t*>(t->GetSlot(slot_offset_)) Why don't we reinterpret_cast to DateValue* directly? This would avoid the unnecessary range check in DateValue's constructor. -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 15 Feb 2019 13:29:31 +0000 Gerrit-HasComments: Yes
