[ 
https://issues.apache.org/jira/browse/IMPALA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-976:
-------------------------------
    Docs Text:   (was: John, I think we should especially call out this series 
of planner fixes to make users aware of the potential positive/negative impact 
on their workload.)

> Planner cardinality estimates from joins can be improved.
> ---------------------------------------------------------
>
>                 Key: IMPALA-976
>                 URL: https://issues.apache.org/jira/browse/IMPALA-976
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 1.3.1, Impala 2.0, Impala 2.3.0
>            Reporter: Nong Li
>            Assignee: Alexander Behm
>            Priority: Critical
>              Labels: performance
>             Fix For: Impala 2.5.0
>
>
> I think we assume n:1 joins so the cardinality is not reduced after the join 
> operator
> Here's an example from tpcds q19 but it applies to almost all the tpcds 
> queries.
> {code}
> Operator          Detail          #Hosts      Avg Time      Max Time         
> #Rows    Est. #Rows      Peak Mem   Est. Peak Mem
> ------------------------------------------------------------------------------------------------------------------------------
> 18:TOP-N                               1       3.708ms       3.708ms          
>  100           100     117.44 KB         -1.00 B
> 17:EXCHANGE                            1     236.149us     236.149us          
> 1000           100       9.77 KB         -1.00 B
> 10:TOP-N                              10     555.602us      591.50us          
> 1000           100      36.00 KB         4.69 KB
> 16:AGGREGATE                          10       4.950ms        5.85ms          
> 3227     264233526     184.84 KB         6.75 GB
> 15:EXCHANGE                           10      14s941ms      14s952ms         
> 32270     264233526      30.09 KB               0
> 09:AGGREGATE                          10     201.615ms     280.335ms         
> 32270     264233526      13.68 MB        12.99 GB
> 08:HASH JOIN      BROADCAST           10     112.415ms     168.229ms       
> 4646655     264233526      12.32 MB        42.00 KB                       
> |--14:EXCHANGE                        10     286.415ms     296.645ms          
> 1350          1350      44.96 KB               0
> |  04:SCAN HDFS   store                1      39.760ms      39.760ms          
> 1350          1350     243.32 KB        32.00 MB
> 07:HASH JOIN      BROADCAST           10       3s519ms       3s900ms       
> 4708900     264233526     993.89 MB       453.97 MB
> |--13:EXCHANGE                        10       2s731ms       2s912ms      
> 15000000      15000000      21.86 MB               0
> |  03:SCAN HDFS   customer_address     8     328.503ms     862.653ms      
> 15000000      15000000      26.30 MB        80.00 MB
> 06:HASH JOIN      BROADCAST           10      16s891ms      17s963ms       
> 4708900     264233526       1.44 GB       377.66 MB
> |--12:EXCHANGE                        10       3s785ms       4s422ms      
> 30000000      30000000      17.84 MB               0
> |  02:SCAN HDFS   customer             9     292.638ms       1s030ms      
> 30000000      30000000      40.81 MB       176.00 MB
> 05:HASH JOIN      BROADCAST           10       2s903ms       3s397ms       
> 4823041     264233526       1.10 MB       292.41 KB                       <-- 
> We assumed all probe rows are returned and now the estimate is off by a 
> factor of 54.
> |--11:EXCHANGE                        10       1s629ms       1s642ms          
> 6478          3429      99.51 KB               0      
> |  01:SCAN HDFS   item                 1     284.503ms     284.503ms          
> 6478          3429       7.76 MB       288.00 MB                       <-- 
> The estimate of the first dim table is close
> 00:SCAN HDFS      store_sales         10     134.410ms      227.81ms     
> 264233526     264233526     446.98 MB       352.00 MB                       
> <-- We have perfect stats on the left most table.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to