Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
......................................................................


Patch Set 28:

(3 comments)

Rework.

http://gerrit.cloudera.org:8080/#/c/17075/27/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17075/27/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@260
PS27, Line 260:       }
> > Looks like mixture of files of different format (like Parquet and ORC at
Reworked method hasAtLeastOneParquetPartition() to iterate until a Parquet 
partition is met.

DONE.


http://gerrit.cloudera.org:8080/#/c/17075/28/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17075/28/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@387
PS28, Line 387:   /*
> Can you remove this method.
Done


http://gerrit.cloudera.org:8080/#/c/17075/28/testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test
File 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test:

http://gerrit.cloudera.org:8080/#/c/17075/28/testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test@98
PS28, Line 98: # Create a new hudi parquet table.
> Please note that it creates a plain Parquet table from the data in the Hudi
Good catch. Did a unit test with the hudi table. All seems good. I think it 
probably is okay to leave out the hudi test for now.

Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
e45818b616ab4ff81fa512c8257bc8fac594094a)
COMPUTE_COLUMN_MINMAX_STATS set to true
Query: compute stats functional_parquet.hudi_non_partitioned
+------------------------------------------+
| summary                                  |
+------------------------------------------+
| Updated 1 partition(s) and 15 column(s). |
+------------------------------------------+
Fetched 1 row(s) in 0.53s
[12:57:17 qchen@qifan-10229: IMPALA-10494_making_use_of_minmax_column_stats] 
sql dml.showstats.hudi
Starting Impala Shell with no authentication using Python 2.7.16
Warning: live_progress only applies to interactive shell sessions, and is being 
skipped for now.
Opened TCP connection to localhost:21000
Connected to localhost:21000
Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
e45818b616ab4ff81fa512c8257bc8fac594094a)
SHOW_COLUMN_MINMAX_STATS set to true
Query: show column stats functional_parquet.hudi_non_partitioned
+------------------------+---------------------------------------+------------------+--------+----------+-------------------+--------+---------+----------------------+--------------------+
| Column                 | Type                                  | #Distinct 
Values | #Nulls | Max Size | Avg Size          | #Trues | #Falses | Min         
         | Max                |
+------------------------+---------------------------------------+------------------+--------+----------+-------------------+--------+---------+----------------------+--------------------+
| _hoodie_commit_time    | STRING                                | 2            
    | 0      | 14       | 14                | -1     | -1      | -1             
      | -1                 |
| _hoodie_commit_seqno   | STRING                                | 97           
    | 0      | 20       | 20                | -1     | -1      | -1             
      | -1                 |
| _hoodie_record_key     | STRING                                | 99           
    | 0      | 36       | 36                | -1     | -1      | -1             
      | -1                 |
| _hoodie_partition_path | STRING                                | 3            
    | 0      | 25       | 25                | -1     | -1      | -1             
      | -1                 |
| _hoodie_file_name      | STRING                                | 6            
    | 0      | 71       | 70.68000030517578 | -1     | -1      | -1             
      | -1                 |
| _hoodie_is_deleted     | BOOLEAN                               | 2            
    | 0      | 1        | 1                 | 0      | 100     | -1             
      | -1                 |
| _row_key               | STRING                                | 99           
    | 0      | 36       | 36                | -1     | -1      | -1             
      | -1                 |
| begin_lat              | DOUBLE                                | 100          
    | 0      | 8        | 8                 | -1     | -1      | 
0.013803214965246391 | 0.9973157077943435 |
| begin_lon              | DOUBLE                                | 99           
    | 0      | 8        | 8                 | -1     | -1      | 
0.014143391676368022 | 0.991562254763212  |
| driver                 | STRING                                | 2            
    | 0      | 10       | 10                | -1     | -1      | -1             
      | -1                 |
| end_lat                | DOUBLE                                | 100          
    | 0      | 8        | 8                 | -1     | -1      | 
7.903052288528167E-4 | 0.9877514097604384 |
| end_lon                | DOUBLE                                | 99           
    | 0      | 8        | 8                 | -1     | -1      | 
0.029829569706356973 | 0.9978872086544781 |
| fare                   | STRUCT<amount:DOUBLE,currency:STRING> | -1           
    | -1     | -1       | -1                | -1     | -1      | -1             
      | -1                 |
| partition              | STRING                                | 3            
    | 0      | 25       | 25                | -1     | -1      | -1             
      | -1                 |
| rider                  | STRING                                | 2            
    | 0      | 9        | 9                 | -1     | -1      | -1             
      | -1                 |
| timestamp              | DOUBLE                                | 1            
    | 0      | 8        | 8                 | -1     | -1      | 0.0            
      | 0.0                |
+------------------------+---------------------------------------+------------------+--------+----------+-------------------+--------+---------+----------------------+--------------------+
Fetched 16 row(s) in 0.02s


| Max Per-Host Resource Reservation: Memory=6.97MB Threads=5                    
                                     |
| Per-Host Resource Estimates: Memory=55MB                                      
                                     |
| Codegen disabled by planner                                                   
                                     |
| Analyzed query: SELECT /* +straight_join */ a.`_hoodie_record_key` FROM       
                                     |
| functional_parquet.hudi_non_partitioned a,                                    
                                     |
| functional_parquet.hudi_non_partitioned b WHERE a.begin_lat = b.end_lat       
                                     |
|                                                                               
                                     |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
                                     |
| Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB 
thread-reservation=1                                |
|   PLAN-ROOT SINK                                                              
                                     |
|   |  output exprs: a.`_hoodie_record_key`                                     
                                     |
|   |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0                           |
|   |                                                                           
                                     |
|   04:EXCHANGE [UNPARTITIONED]                                                 
                                     |
|      mem-estimate=23.18KB mem-reservation=0B thread-reservation=0             
                                     |
|      tuple-ids=0,1 row-size=64B cardinality=100                               
                                     |
|      in pipelines: 00(GETNEXT)                                                
                                     |
|                                                                               
                                     |
| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                
                                     |
| Per-Host Resources: mem-estimate=34.95MB mem-reservation=2.95MB 
thread-reservation=2 runtime-filters-memory=1.00MB |
|   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, UNPARTITIONED]                  
                                     |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
                                     |
|   02:HASH JOIN [INNER JOIN, BROADCAST]                                        
                                     |
|   |  hash predicates: a.begin_lat = b.end_lat                                 
                                     |
|   |  fk/pk conjuncts: a.begin_lat = b.end_lat                                 
                                     |
|   |  runtime filters: RF000[bloom] <- b.end_lat, RF001[min_max] <- b.end_lat  
                                     |
|   |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0                          |
|   |  tuple-ids=0,1 row-size=64B cardinality=100                               
                                     |
|   |  in pipelines: 00(GETNEXT), 01(OPEN)                                      
                                     |
|   |                                                                           
                                     |
|   |--03:EXCHANGE [BROADCAST]                                                  
                                     |
|   |     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0          
                                     |
|   |     tuple-ids=1 row-size=8B cardinality=100                               
                                     |
|   |     in pipelines: 01(GETNEXT)                                             
                                     |
|   |                                                                           
                                     |
|   00:SCAN HDFS [functional_parquet.hudi_non_partitioned a, RANDOM]            
                                     |
|      HDFS partitions=1/1 files=3 size=28.45KB                                 
                                     |
|      runtime filters: RF001[min_max] -> a.begin_lat, RF000[bloom] -> 
a.begin_lat                                   |
|      stored statistics:                                                       
                                     |
|        table: rows=100 size=28.45KB                                           
                                     |
|        columns: all                                                           
                                     |
|      extrapolated-rows=disabled max-scan-range-rows=34                        
                                     |
|      file formats: [PARQUET]                                                  
                                     |
|      mem-estimate=32.00MB mem-reservation=16.00KB thread-reservation=1        
                                     |
|      tuple-ids=0 row-size=56B cardinality=100                                 
                                     |
|      in pipelines: 00(GETNEXT)                                                
                                     |
|                                                                               
                                     |
| F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                
                                     |
| Per-Host Resources: mem-estimate=16.00MB mem-reservation=16.00KB 
thread-reservation=2                              |
|   DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=03, BROADCAST]                      
                                     |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
                                     |
|   01:SCAN HDFS [functional_parquet.hudi_non_partitioned b, RANDOM]            
                                     |
|      HDFS partitions=1/1 files=3 size=28.45KB                                 
                                     |
|      stored statistics:                                                       
                                     |
|        table: rows=100 size=28.45KB                                           
                                     |
|        columns: all                                                           
                                     |
|      extrapolated-rows=disabled max-scan-range-rows=34                        
                                     |
|      file formats: [PARQUET]                                                  
                                     |
|      mem-estimate=16.00MB mem-reservation=16.00KB thread-reservation=1        
                                     |
|      tuple-ids=1 row-size=8B cardinality=100                                  
                                     |
|      in pipelines: 01(GETNEXT)                                                
                                     |
+--------------------------------------------------------------------------------------------------------------------+
Fetched 61 row(s) in 0.02s

Query: select straight_join  a._hoodie_record_key from
hudi_non_partitioned a, hudi_non_partitioned b
where a.begin_lat = b.end_lat
Query submitted at: 2021-04-01 13:06:55 (Coordinator: http://qifan-10229:25000)
Query progress can be monitored at: 
http://qifan-10229:25000/query_plan?query_id=5d4afb9f44b47432:2a456b9d00000000
Fetched 0 row(s) in 0.11s



-- 
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 01 Apr 2021 17:20:57 +0000
Gerrit-HasComments: Yes

Reply via email to