Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
> Okay. Thanks a lot for trying the NULL row (to the filter) method. Yes, my
https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches describes 
the physical layout of the row batches.

So each row is a composition of pointers to tuples. A join produces rows with a 
new layout which is the concatenation of the tuples of the input rows (except 
semi-joins, which retain the row layout from the outer). You can see that in 
the plan I included below, where a left join produces a row composed of tuples 
1 and 2 from the inputs: tuple-ids=0,1N

An unmatched row from an outer join is represented by a NULL tuple pointer. 
Nullability is represented in the row descriptor as an extra flag (that's the N 
in 1N above). Most tuples are non-nullable - that's only introduced by outer 
joins and aggregations with multiple agg classes IIRC.

The runtime filter expressions in this example would be evaluated over the 
input row produced by operator 04, which has a single non-nullable tuple 1.

The problem is that each expression tree in the planner is specific to a 
particular row layout, so to evaluate a runtime filter expression over a row 
with layout 1N instead of layout 1, we'd need to generate a new RowDescriptor, 
then clone the runtime filter expression and fixed it up by replacing all the 
SlotRefs with SlotRefs referencing the new row descriptor. I think we would 
also need to add some TupleIsNull() predicates in order to correctly handle the 
nullability (since SlotRef only handles the slot-level nullability, I think). 
Then we'd have to plumb the expression through to the backend so it can be 
evaluated.

  > explain select count(*) from functional.alltypes t1 left join 
functional.alltypestiny t2 on t1.id = t2.id;
  Query: explain select count(*) from functional.alltypes t1 left join 
functional.alltypestiny t2 on t1.id = t2.id
  
+-----------------------------------------------------------------------------------------+
  | Explain String                                                              
            |
  
+-----------------------------------------------------------------------------------------+
  | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5                  
            |
  | Per-Host Resource Estimates: Memory=214MB                                   
            |
  | Codegen disabled by planner                                                 
            |
  | Analyzed query: SELECT count(*) FROM functional.alltypes t1 LEFT OUTER JOIN 
            |
  | functional.alltypestiny t2 ON t1.id = t2.id                                 
            |
  |                                                                             
            |
  | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
            |
  | |  Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B 
thread-reservation=1     |
  | PLAN-ROOT SINK                                                              
            |
  | |  output exprs: count(*)                                                   
            |
  | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
            |
  | |                                                                           
            |
  | 06:AGGREGATE [FINALIZE]                                                     
            |
  | |  output: count:merge(*)                                                   
            |
  | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0     |
  | |  tuple-ids=2 row-size=8B cardinality=1                                    
            |
  | |  in pipelines: 06(GETNEXT), 03(OPEN)                                      
            |
  | |                                                                           
            |
  | 05:EXCHANGE [UNPARTITIONED]                                                 
            |
  | |  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0             
            |
  | |  tuple-ids=2 row-size=8B cardinality=1                                    
            |
  | |  in pipelines: 03(GETNEXT)                                                
            |
  | |                                                                           
            |
  | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                              
            |
  | Per-Host Resources: mem-estimate=171.94MB mem-reservation=1.97MB 
thread-reservation=2   |
  | 03:AGGREGATE                                                                
            |
  | |  output: count(*)                                                         
            |
  | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0     |
  | |  tuple-ids=2 row-size=8B cardinality=1                                    
            |
  | |  in pipelines: 03(GETNEXT), 00(OPEN)                                      
            |
  | |                                                                           
            |
  | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                   
            |
  | |  hash predicates: t1.id = t2.id                                           
            |
  | |  fk/pk conjuncts: t1.id = t2.id                                           
            |
  | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
  | |  tuple-ids=0,1N row-size=8B cardinality=7.30K                             
            |
  | |  in pipelines: 00(GETNEXT), 01(OPEN)                                      
            |
  | |                                                                           
            |
  | |--04:EXCHANGE [BROADCAST]                                                  
            |
  | |  |  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0          
            |
  | |  |  tuple-ids=1 row-size=4B cardinality=8                                 
            |
  | |  |  in pipelines: 01(GETNEXT)                                             
            |
  | |  |                                                                        
            |
  | |  F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                           
            |
  | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
  | |  01:SCAN HDFS [functional.alltypestiny t2, RANDOM]                        
            |
  | |     HDFS partitions=4/4 files=4 size=460B                                 
            |
  | |     stored statistics:                                                    
            |
  | |       table: rows=8 size=460B                                             
            |
  | |       partitions: 4/4 rows=8                                              
            |
  | |       columns: all                                                        
            |
  | |     extrapolated-rows=disabled max-scan-range-rows=2                      
            |
  | |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1      
            |
  | |     tuple-ids=1 row-size=4B cardinality=8                                 
            |
  | |     in pipelines: 01(GETNEXT)                                             
            |
  | |                                                                           
            |
  | 00:SCAN HDFS [functional.alltypes t1, RANDOM]                               
            |
  |    HDFS partitions=24/24 files=24 size=478.45KB                             
            |
  |    stored statistics:                                                       
            |
  |      table: rows=7.30K size=478.45KB                                        
            |
  |      partitions: 24/24 rows=7.30K                                           
            |
  |      columns: all                                                           
            |
  |    extrapolated-rows=disabled max-scan-range-rows=310                       
            |
  |    mem-estimate=160.00MB mem-reservation=32.00KB thread-reservation=1       
            |
  |    tuple-ids=0 row-size=4B cardinality=7.30K                                
            |
  |    in pipelines: 00(GETNEXT)                                                
            |
  
+-----------------------------------------------------------------------------------------+



--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 27 Oct 2020 16:23:03 +0000
Gerrit-HasComments: Yes

Reply via email to