[ 
https://issues.apache.org/jira/browse/HIVE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-9523:
---------------------------------
    Labels: gsoc2015  (was: )

> when columns on which tables are partitioned are used in the join condition 
> same join optimizations as for bucketed tables should be applied
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9523
>                 URL: https://issues.apache.org/jira/browse/HIVE-9523
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer, Physical Optimizer, SQL
>    Affects Versions: 0.13.0, 0.14.0, 0.13.1
>            Reporter: Maciek Kocon
>              Labels: gsoc2015
>
> For JOIN conditions where partitioning criteria are used respectively:
>             ⋮ 
> FROM TabA JOIN TabB
>    ON TabA.partCol1 = TabB.partCol2
>    AND TabA.partCol2 = TabB.partCol2
> the optimizer could/should choose to treat it the same way as with bucketed 
> tables: ⋮ 
> FROM TabC
>   JOIN TabD
>      ON TabC.clusteredByCol1 = TabD.clusteredByCol2
>    AND TabC.clusteredByCol2 = TabD.clusteredByCol2
> and use either Bucket Map Join or better, the Sort Merge Bucket Map Join.
> This is based on fact that same way as buckets translate to separate files, 
> the partitions essentially provide the same mapping.
> When data locality is known the optimizer could focus only on joining 
> corresponding partitions rather than whole data sets.
> #side notes:
> ⦿ Currently Table DDL Syntax where Partitioning and Bucketing defined at the 
> same time is allowed:
> CREATE TABLE
>  ⋮
> PARTITIONED BY(…) CLUSTERED BY(…) INTO … BUCKETS;
> But in this case optimizer never chooses to use Bucket Map Join or Sort Merge 
> Bucket Map Join which defeats the purpose of creating BUCKETed tables in such 
> scenarios. Should that be raised as a separate BUG?
> ⦿ Currently partitioning and bucketing are two separate things but serve same 
> purpose - shouldn't the concept be merged (explicit/implicit partitions?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to