Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21860 )

Change subject: IMPALA-13405: Do tuple analysis to lower AggregationNode 
cardinality
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test:

http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test@48
PS4, Line 48: |  tuple-ids=5 row-size=214B cardinality=108.00K cost=1050713
Why are the old and new values so much higher than with cpu cost disabled?


http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test:

http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test@384
PS4, Line 384: |  row-size=40B cardinality=10
The actual output of this run is pretty weird

  F02:ROOT                            1      1  271.000us  271.000us            
            4.01 MB        4.00 MB
  13:MERGING-EXCHANGE                 1      1   22.027us   22.027us      10    
      10   48.00 KB       16.00 KB  UNPARTITIONED                
  F01:EXCHANGE SENDER                 3      3   31.969us   37.302us            
            3.94 KB      176.00 KB            
  10:TOP-N                            3      3  138.518us  201.601us      30    
      10   16.00 KB       400.00 B
  12:AGGREGATE                        3      3    5.512ms    5.788ms  11.62K    
      10   34.04 MB      128.00 MB  FINALIZE
  11:EXCHANGE                         3      3   38.321us   48.434us  11.62K    
      10  176.00 KB       16.00 KB  HASH(o_orderkey,o_orderdate,o_shippriority)
  F00:EXCHANGE SENDER                 3      3  811.854us    1.367ms            
          253.19 KB      528.00 KB       
  09:AGGREGATE                        3      3    5.399ms    9.315ms  11.62K    
      10   44.05 MB      128.00 MB  STREAMING
  01:SUBPLAN                          3      3    1.654ms    3.320ms       0    
   3.00M   10.00 MB              0                                         
  |--08:NESTED LOOP JOIN              3      3    3.806ms    7.477ms       0    
     100   32.00 KB        33.00 B  CROSS JOIN          
  |  |--02:SINGULAR ROW SRC           3      3    0.000ns    0.000ns       0    
       1          0              0       
  |  04:SUBPLAN                       3      3    3.018ms    5.988ms       0    
     100    8.00 KB              0
  |  |--07:NESTED LOOP JOIN           3      3    4.807ms    9.485ms       0    
      10   24.00 KB        36.00 B  CROSS JOIN                                  
  |  |  |--05:SINGULAR ROW SRC        3      3    0.000ns    0.000ns       0    
       1          0              0                               
  |  |  06:UNNEST                     3      3  659.021us    1.316ms       0    
      10          0              0  o.o_lineitems l
  |  03:UNNEST                        3      3  371.777us  737.710us       0    
      10          0              0  c.c_orders o
  00:SCAN HDFS                        3      3  212.737ms  298.593ms   8.64K    
  30.00K   93.27 MB      616.00 MB  tpch_nested_parquet.customer c

We get to the same result at the top, but we have a subplan that expects to 
take in 100 rows producing 3M, but in actuality both process 0 rows. Then we 
estimate 10 rows for 09:AGGREGATE but get 11.62K.

The old and new estimates are both wildly wrong; I'm not sure why we end up 
underestimating now though. That seems like possibly a bug in this 
implementation.



--
To view, visit http://gerrit.cloudera.org:8080/21860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd589ab5f7ba9566a0d35784f61f5ffaef5696e7
Gerrit-Change-Number: 21860
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 03 Oct 2024 17:43:49 +0000
Gerrit-HasComments: Yes

Reply via email to