Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 )
Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner ...................................................................... Patch Set 2: (13 comments) The code looks good. My concern is since there is a delay for the filter to arrive at the scan node. Creating push-down predicates at open time may not be enough. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h@180 PS2, Line 180: perm_ nit. would search_argument_pool sounds better? http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@833 PS2, Line 833: typename T, typename U> nit. Please add a comment: T is the intended ORC primitive type and U is Impala internal primitive type. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@834 PS2, Line 834: GetPrimitiveLiteral nit. maybe renamed as GetORCPrimitiveLiteral. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@836 PS2, Line 836: !val) UNLIKELY()? http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@874 PS2, Line 874: val UNLIKELY()? http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@895 PS2, Line 895: !val UNLIKELY()? http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@901 PS2, Line 901: !val) UNLIKELY()? http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@905 PS2, Line 905: dst_ptr need to check NULL http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@917 PS2, Line 917: Decimal4Value* dv4; : Decimal8Value* dv8; : Decimal16Value* dv16; These three local variables can be relocated to each of the relevant cases. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@941 PS2, Line 941: return orc::Literal(orc::PredicateDataType::BOOLEAN); I wonder if we should return Status for this method instead of ORC literal as DCHEK() is only valid in the debug build. It is quite possible that the conversion may fail. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@978 PS2, Line 978: eval For parquet, the actual min/max values are available from the MinMaxFilter at line 1412 below. It is interesting that the same values can be obtained from the expression. 1406 for (auto desc: GetOverlapPredicateDescs()) { 1407 int filter_id = desc.filter_id; 1408 int slot_idx = desc.slot_index; 1409 1410 int filter_idx = FindFilterIndex(filter_id); 1411 MinMaxFilter* minmax_filter = FindMinMaxFilter(filter_idx); 1412 if (!minmax_filter || !IsFilterWorthyForOverlapCheck(filter_idx)) { 1413 continue; 1414 } 1415 hdfs-parquet-scanner.cc There is also a delay for the filter to arrive at the scan node. Thus, creating push-down predicates at open time may not be enough. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@980 PS2, Line 980: if (fn_name == "lt") { : sarg->lessThan(col_name, predicate_type, literal); : } else if (fn_name == "gt") { : sarg->startNot() : .lessThanEquals(col_name, predicate_type, literal) : .end(); : } else if (fn_name == "le") { : sarg->lessThanEquals(col_name, predicate_type, literal); : } else if (fn_name == "ge") { : sarg->startNot() I think for a min/max filter F, we need to formulate the following two push-down predicates: F.GetMin() <= <c> AND <c> <= F.GetMax() http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/service/query-options.h@50 PS2, Line 50: ORC_READ_STATISTICS ORC_PUSHDOWN_PREDICATES probably sounds better for this patch. Maybe there are other features to be included? -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 19 Aug 2021 14:58:45 +0000 Gerrit-HasComments: Yes
