Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
......................................................................


Patch Set 14:

Fix three issues as follows.

P15:

Section DISTRIBUTEDPLAN of query:                                               
  
select                                                      
  l_orderkey,                         
  sum(l_extendedprice * (1 - l_discount)) as revenue,                           
           
  o_orderdate,                                                       
  o_shippriority                               
from                     
  customer,                                                                   
  orders,                                               
  lineitem                        
where                                                                           
       
  c_mktsegment = 'BUILDING'                                    
  and c_custkey = o_custkey            
  and l_orderkey = o_orderkey                                                   
          
  and o_orderdate < '1995-03-15'                                    
  and l_shipdate > '1995-03-15'               
group by                
  l_orderkey,                                                                
  o_orderdate,                                         
  o_shippriority                 
order by                                                                        
      
  revenue desc,                                                 
  o_orderdate                             
limit 10                                                                        
               



68 12:MERGING-EXCHANGE [UNPARTITIONED]                      
 7569 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 7570 |  order by: sum(l_extendedprice * (1 - l_discount)) DESC, o_orderdate 
ASC            
 7571 |  limit: 10                                                   
 7572 |                                       
 7573 06:TOP-N [LIMIT=10]                                                       
              
 7574 |  order by: sum(l_extendedprice * (1 - l_discount)) DESC, o_orderdate ASC
 7575 |  row-size=50B cardinality=10        
 7576 |                                                                         
                
 7577 11:AGGREGATE [FINALIZE]                                            
 7578 |  output: sum:merge(l_extendedprice * (1 - l_discount))
 7579 |  group by: l_orderkey, o_orderdate, o_shippriority
 7580 |  row-size=50B cardinality=17.56K                                 
 7581 |                                           
 7582 10:EXCHANGE [HASH(l_orderkey,o_orderdate,o_shippriority)]
 7583 |                                                                        
 7584 05:AGGREGATE [STREAMING]                          
 7585 |  output: sum(l_extendedprice * (1 - l_discount))
 7586 |  group by: l_orderkey, o_orderdate, o_shippriority                      
     
 7587 |  row-size=50B cardinality=17.56K                
 7588 |                          
 7589 04:HASH JOIN [INNER JOIN, PARTITIONED]                                    
     
 7590 |  hash predicates: o_custkey = c_custkey             
 7591 |  runtime filters: RF000 <- c_custkey
 7592 |  row-size=0B cardinality=17.56K                                         
   
 7593 |                                                     
 7594 |--09:EXCHANGE [HASH(c_custkey)]
 7595 |  |                                                                      
         
 7596 |  00:SCAN HDFS [tpch.customer]                           
 7597 |     HDFS partitions=1/1 files=1 size=23.08MB
 7598 |     predicates: c_mktsegment = 'BUILDING'                               
 
 7599 |     row-size=0B cardinality=30.00K          
 7600 |                      
 7601 08:EXCHANGE [HASH(o_custkey)]        <==== this is extra. Fixed by copy 
avgRowSize in PlanNode.  Exchange node above a scan should see scan’s avg row 
size as 28.9974.  It was 0 instead.              
 7602 |                                                          
 7603 03:HASH JOIN [INNER JOIN, BROADCAST]
 7604 |  hash predicates: l_orderkey = o_orderkey

P16:

Query:      

in tpch-all.test


use tpch_nested_parquet;
                   
select                                                                          
            
  o_orderpriority,                                                    
  count(*) as order_count                       
from                      
  customer c,                                                                  
  c.c_orders o                                         
where                            
  o_orderdate >= '1993-07-01'                                                   
    
  and o_orderdate < '1993-10-01'                              
  and exists (                          
    select                                                                      
             
      *                                                                
    from                                         
      o.o_lineitems      
    where                                                                     
      l_commitdate < l_receiptdate                      
    )                             
group by                                                                        
       
  o_orderpriority                                                
order by                                   
  o_orderpriority                                                               
                
Error Stack:                                                              
java.lang.IllegalStateException: Must be analyzed before serializing to thrift. 
IsNotEmptyPredicate{id=null, type=INVALID_TYPE, toSql=!empty(c.c_orders), 
sel=-1.0, evalCost=-1.0, #distinct=-1}. <== resolved by fixing the 
IsNotEmptyPredicate.lone() method.
  at com.google.common.base.Preconditions.checkState(Preconditions.java:589)
  at org.apache.impala.analysis.Expr.treeToThriftHelper(Expr.java:850)          
         
  at org.apache.impala.analysis.Expr.treeToThrift(Expr.java:844)
  at org.apache.impala.planner.PlanNode.treeToThriftHelper(PlanNode.java:489

P17:

select * from (                                                                 
       
  select int_col, bigint_col, smallint_col,                           
    rank() over (partition by int_col order by smallint_col desc) rk
  from functional.alltypesagg) dt 
where rk <= 10                                                                  
          
order by int_col, bigint_col, smallint_col, rk                           
limit 10;

                                             
Error Stack:                        
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0      <======= resolved 
by smart copying the embedded ancestor analyticEvalNode_ only once       
  at java.util.ArrayList.rangeCheck(ArrayList.java:659)
  at java.util.ArrayList.set(ArrayList.java:450)                                
           
  at org.apache.impala.common.TreeNode.setChild(TreeNode.java:50)
  at 
org.apache.impala.planner.DistributedPlanner.createAnalyticFragment(DistributedPlanner.java:1149)
  at 
org.apache.impala.planner.DistributedPlanner.createPlanFragments(DistributedPlanner.java:137)


--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Comment-Date: Fri, 10 Dec 2021 20:29:27 +0000
Gerrit-HasComments: No

Reply via email to