Vineet Garg created HIVE-21690:
----------------------------------

             Summary: Support outer joins with HiveAggregateJoinTransposeRule 
and turn it on by default
                 Key: HIVE-21690
                 URL: https://issues.apache.org/jira/browse/HIVE-21690
             Project: Hive
          Issue Type: Improvement
          Components: Query Planning
            Reporter: Vineet Garg
            Assignee: Vineet Garg


1) This optimization is off by default. We would like to turn on this 
optimization wherein group by is pushed down to join, in some cases top 
aggregate is removed but in most of the cases this optimization adds extra 
aggregate nodes. To measure if those extra aggregates are beneficial or not 
(they might add extra overhead without reducing rows) cost is computed and 
compared b/w previous plan and new plan.

Since Hive's cost model only consider JOIN's cost and discard cost of rest of 
the nodes, this comparison always favor new plan (since adding aggregate 
beneath join reduces the total number of rows processed by the join and 
therefore reduces the join cost). Therefore turning on this optimization with 
existing cost model is not a good idea.

One approach to fix this is to localize the cost computation to the rule 
itself, i.e compute the non-cumulative cost of existing aggregate and join and 
compare it with new cost of new aggregates, join and top aggregate. 

Better approach in my opinion would be to fix the cost model and take aggregate 
cost into account (along with the join). This could affect other queries and 
can cause performance regression but those will most likely be issues with the 
planning and should be investigated and fixed.


2) This optimization currently only support INNER JOIN. This can be extended to 
support OUTER joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to