[ 
https://issues.apache.org/jira/browse/HIVE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810501#comment-13810501
 ] 

Laljo John Pullokkaran commented on HIVE-5709:
----------------------------------------------

This must be ideally a cost based decision. Pulling one join key out and 
applying it as filter has following consequences:
Pro:
 1. It saves one shuffling cost

Con:
1. Degree of parallelism may be reduced. Since partitioning of mapper's result 
set is based on join key. 
    hf(a,b) != hf(a)

2. The intermediate result set may be large when some join keys are pushed 
above join as filter.

Due to above factors it seems like this should be a cost based decision.

> Extend Join merging logic to merge 2 Joins when one Join expression list is a 
> subset of the other.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5709
>                 URL: https://issues.apache.org/jira/browse/HIVE-5709
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Harish Butani
>
> As pointed out by [~ashutoshc] here: https://reviews.apache.org/r/14953/
> For the following query
> {noformat}
> select p1.name, p2.name, p3.name
> from part p1 join p2 on p1.name = p2.name and p1.key = p2.key join 
> part p3 on p1.name = p3.name
> {noformat}
> 2 jobs are generated:
> - p1 join p2 on name, key
> - join p3 on name
> This can be done as:
> - 1 3-way join of p1,p2,p3 on name
> - followed by a Filter on p1.key = p2.key
> This is valid only for inner joins. 
> This can be done by extending the Merge Join logic to check for a subset 
> relation between 2 QBJoinTree expression lists. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to