Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 )
Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@875 PS1, Line 875: UtcToUnixTime > According to ORC spec the timestamp stats are stored as millisecs (which is Looks like it's an ORC bug that the ORC writers (both Java and C++ versions) don't handle the rounding issue of max: https://github.com/apache/orc/blob/fea154436c37c81a16b13d879b510096cfaa2946/java/core/src/java/org/apache/orc/impl/writer/TimestampTreeWriter.java#L108 https://github.com/apache/orc/blob/fea154436c37c81a16b13d879b510096cfaa2946/c%2B%2B/src/ColumnWriter.cc#L1800 http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@902 PS1, Line 902: type.GetByteSize() I think this should be "literal_expr->type().GetByteSize()" since it's consistent with the type of "val". http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@914 PS1, Line 914: static_cast<uint64_t>((dv16->value() << 64) >> 64) Can we cast int128_t to uint64_t directly? http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@984 PS1, Line 984: row_reader_options_.setSearchArgument(std::move(final_sarg)); Could you add a VLOG_FILE level logging for 'final_sarg'? It would be helpful for debugging. http://gerrit.cloudera.org:8080/#/c/15403/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/15403/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@543 PS1, Line 543: } else if (op == BinaryPredicate.Operator.EQ) { ORC lib supports pushing down EQ predicates. I think we don't need to transform them into LE+GE predicates. But I'm ok leaving this as a further optimization. -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 16 Mar 2020 08:17:50 +0000 Gerrit-HasComments: Yes
