[ https://issues.apache.org/jira/browse/HIVE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810501#comment-13810501 ]
Laljo John Pullokkaran commented on HIVE-5709: ---------------------------------------------- This must be ideally a cost based decision. Pulling one join key out and applying it as filter has following consequences: Pro: 1. It saves one shuffling cost Con: 1. Degree of parallelism may be reduced. Since partitioning of mapper's result set is based on join key. hf(a,b) != hf(a) 2. The intermediate result set may be large when some join keys are pushed above join as filter. Due to above factors it seems like this should be a cost based decision. > Extend Join merging logic to merge 2 Joins when one Join expression list is a > subset of the other. > -------------------------------------------------------------------------------------------------- > > Key: HIVE-5709 > URL: https://issues.apache.org/jira/browse/HIVE-5709 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Harish Butani > > As pointed out by [~ashutoshc] here: https://reviews.apache.org/r/14953/ > For the following query > {noformat} > select p1.name, p2.name, p3.name > from part p1 join p2 on p1.name = p2.name and p1.key = p2.key join > part p3 on p1.name = p3.name > {noformat} > 2 jobs are generated: > - p1 join p2 on name, key > - join p3 on name > This can be done as: > - 1 3-way join of p1,p2,p3 on name > - followed by a Filter on p1.key = p2.key > This is valid only for inner joins. > This can be done by extending the Merge Join logic to check for a subset > relation between 2 QBJoinTree expression lists. -- This message was sent by Atlassian JIRA (v6.1#6144)