Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
......................................................................


Patch Set 2:

(13 comments)

The code looks good.

My concern is since there is a delay for the filter to arrive at the scan node. 
Creating push-down predicates at open time may not be enough.

http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h@180
PS2, Line 180: perm_
nit. would search_argument_pool sounds better?


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@833
PS2, Line 833: typename T, typename U>
nit. Please add a comment: T is the intended ORC primitive type and U is Impala 
internal primitive type.


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@834
PS2, Line 834: GetPrimitiveLiteral
nit. maybe renamed as GetORCPrimitiveLiteral.


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@836
PS2, Line 836: !val)
UNLIKELY()?


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@874
PS2, Line 874: val
UNLIKELY()?


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@895
PS2, Line 895: !val
UNLIKELY()?


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@901
PS2, Line 901: !val)
UNLIKELY()?


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@905
PS2, Line 905: dst_ptr
need to check NULL


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@917
PS2, Line 917: Decimal4Value* dv4;
             :       Decimal8Value* dv8;
             :       Decimal16Value* dv16;
These three local variables can be relocated to each of the relevant cases.


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@941
PS2, Line 941: return orc::Literal(orc::PredicateDataType::BOOLEAN);
I wonder if we should return Status for this method instead of ORC literal as 
DCHEK() is only valid in the debug build. It is quite possible that the 
conversion may fail.


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@978
PS2, Line 978: eval
For parquet, the actual min/max values are available from the MinMaxFilter at 
line 1412 below. It is interesting that the same values can be obtained from 
the expression.

1406   for (auto desc: GetOverlapPredicateDescs()) {                            
                
1407     int filter_id = desc.filter_id;                              
1408     int slot_idx = desc.slot_index;    
1409                  
1410     int filter_idx = FindFilterIndex(filter_id);                    
1411     MinMaxFilter* minmax_filter = FindMinMaxFilter(filter_idx);
1412     if (!minmax_filter || !IsFilterWorthyForOverlapCheck(filter_idx)) {
1413       continue;                                                            
1414     }                                                
1415                                
              
hdfs-parquet-scanner.cc

There is also a delay for the filter to arrive at the scan node. Thus, creating 
push-down predicates at open time may not be enough.


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@980
PS2, Line 980:  if (fn_name == "lt") {
             :       sarg->lessThan(col_name, predicate_type, literal);
             :     } else if (fn_name == "gt") {
             :       sarg->startNot()
             :           .lessThanEquals(col_name, predicate_type, literal)
             :           .end();
             :     } else if (fn_name == "le") {
             :       sarg->lessThanEquals(col_name, predicate_type, literal);
             :     } else if (fn_name == "ge") {
             :       sarg->startNot()
I think for a min/max filter F, we need to formulate the following two 
push-down predicates:

F.GetMin() <= <c> AND <c> <= F.GetMax()


http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/service/query-options.h@50
PS2, Line 50: ORC_READ_STATISTICS
ORC_PUSHDOWN_PREDICATES probably sounds better for this patch. Maybe there are 
other features to be included?



--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 19 Aug 2021 14:58:45 +0000
Gerrit-HasComments: Yes

Reply via email to