Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17960 )

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323
PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false;
> In this patch we set is_bound_by_parttion_columns for the computed partitio
I got it. Thanks for the explanation.

So in this case, maybe we can add a new field to TRuntimeFilterTargetDesc: 
is_data_in_data_file.

At line 678 in this file, we do the test as follows.

  if (IsBoundByPartitionColumn(idx) && !IsDataInDataFile(idx)) {
      continue;
    }

 93                                                                             
              
 94 // Specification of a runtime filter target.                       
 95 struct TRuntimeFilterTargetDesc {             
 96   // Target node id      
 97   1: Types.TPlanNodeId node_id                                              
 
 98                                                         
 99   // Expr on which the filter is applied
100   2: required Exprs.TExpr target_expr                                       
           
101                                                                   
102   // Indicates if 'target_expr' is bound only by partition columns
103   3: required bool is_bound_by_partition_columns
104                                                                             
  
105   // Slot ids on which 'target_expr' is bound on         
106   4: required list<Types.TSlotId> target_expr_slotids
107                                                                             
            
108   // Indicates if this target is on the same fragment as the join that
109   // produced the runtime filter              
110   5: required bool is_local_target
111                                                                             
   
112   // If the target node is a Kudu scan node, the name, in the case it 
appears in Kudu, and
113   // type of the targeted column.  
114   6: optional string kudu_col_name                                          
             
115   7: optional Types.TColumnType kudu_col_type;                    
116                                              
117   // The low and high value as seen in the column stats of the targeted 
column.
118   8: optional Data.TColumnValue low_value                                 
119   9: optional Data.TColumnValue high_value         
120                               
121   // Indicates if the low and high value in column stats are present        
        
122   10: optional bool is_min_max_value_present                   
123 }


http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678
PS2, Line 678: && IsSimplePartitionedTable()
> For simple partitioned tables we don't want to evaluate the filters at the
make sense. Done



--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 22 Oct 2021 16:45:12 +0000
Gerrit-HasComments: Yes

Reply via email to