[ https://issues.apache.org/jira/browse/IMPALA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Rodoni updated IMPALA-976: ------------------------------- Docs Text: (was: John, I think we should especially call out this series of planner fixes to make users aware of the potential positive/negative impact on their workload.) > Planner cardinality estimates from joins can be improved. > --------------------------------------------------------- > > Key: IMPALA-976 > URL: https://issues.apache.org/jira/browse/IMPALA-976 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 1.3.1, Impala 2.0, Impala 2.3.0 > Reporter: Nong Li > Assignee: Alexander Behm > Priority: Critical > Labels: performance > Fix For: Impala 2.5.0 > > > I think we assume n:1 joins so the cardinality is not reduced after the join > operator > Here's an example from tpcds q19 but it applies to almost all the tpcds > queries. > {code} > Operator Detail #Hosts Avg Time Max Time > #Rows Est. #Rows Peak Mem Est. Peak Mem > ------------------------------------------------------------------------------------------------------------------------------ > 18:TOP-N 1 3.708ms 3.708ms > 100 100 117.44 KB -1.00 B > 17:EXCHANGE 1 236.149us 236.149us > 1000 100 9.77 KB -1.00 B > 10:TOP-N 10 555.602us 591.50us > 1000 100 36.00 KB 4.69 KB > 16:AGGREGATE 10 4.950ms 5.85ms > 3227 264233526 184.84 KB 6.75 GB > 15:EXCHANGE 10 14s941ms 14s952ms > 32270 264233526 30.09 KB 0 > 09:AGGREGATE 10 201.615ms 280.335ms > 32270 264233526 13.68 MB 12.99 GB > 08:HASH JOIN BROADCAST 10 112.415ms 168.229ms > 4646655 264233526 12.32 MB 42.00 KB > |--14:EXCHANGE 10 286.415ms 296.645ms > 1350 1350 44.96 KB 0 > | 04:SCAN HDFS store 1 39.760ms 39.760ms > 1350 1350 243.32 KB 32.00 MB > 07:HASH JOIN BROADCAST 10 3s519ms 3s900ms > 4708900 264233526 993.89 MB 453.97 MB > |--13:EXCHANGE 10 2s731ms 2s912ms > 15000000 15000000 21.86 MB 0 > | 03:SCAN HDFS customer_address 8 328.503ms 862.653ms > 15000000 15000000 26.30 MB 80.00 MB > 06:HASH JOIN BROADCAST 10 16s891ms 17s963ms > 4708900 264233526 1.44 GB 377.66 MB > |--12:EXCHANGE 10 3s785ms 4s422ms > 30000000 30000000 17.84 MB 0 > | 02:SCAN HDFS customer 9 292.638ms 1s030ms > 30000000 30000000 40.81 MB 176.00 MB > 05:HASH JOIN BROADCAST 10 2s903ms 3s397ms > 4823041 264233526 1.10 MB 292.41 KB <-- > We assumed all probe rows are returned and now the estimate is off by a > factor of 54. > |--11:EXCHANGE 10 1s629ms 1s642ms > 6478 3429 99.51 KB 0 > | 01:SCAN HDFS item 1 284.503ms 284.503ms > 6478 3429 7.76 MB 288.00 MB <-- > The estimate of the first dim table is close > 00:SCAN HDFS store_sales 10 134.410ms 227.81ms > 264233526 264233526 446.98 MB 352.00 MB > <-- We have perfect stats on the left most table. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org