[ https://issues.apache.org/jira/browse/HIVE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vikram Dixit K updated HIVE-9523: --------------------------------- Labels: gsoc2015 (was: ) > when columns on which tables are partitioned are used in the join condition > same join optimizations as for bucketed tables should be applied > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-9523 > URL: https://issues.apache.org/jira/browse/HIVE-9523 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer, Physical Optimizer, SQL > Affects Versions: 0.13.0, 0.14.0, 0.13.1 > Reporter: Maciek Kocon > Labels: gsoc2015 > > For JOIN conditions where partitioning criteria are used respectively: > ⋮ > FROM TabA JOIN TabB > ON TabA.partCol1 = TabB.partCol2 > AND TabA.partCol2 = TabB.partCol2 > the optimizer could/should choose to treat it the same way as with bucketed > tables: ⋮ > FROM TabC > JOIN TabD > ON TabC.clusteredByCol1 = TabD.clusteredByCol2 > AND TabC.clusteredByCol2 = TabD.clusteredByCol2 > and use either Bucket Map Join or better, the Sort Merge Bucket Map Join. > This is based on fact that same way as buckets translate to separate files, > the partitions essentially provide the same mapping. > When data locality is known the optimizer could focus only on joining > corresponding partitions rather than whole data sets. > #side notes: > ⦿ Currently Table DDL Syntax where Partitioning and Bucketing defined at the > same time is allowed: > CREATE TABLE > ⋮ > PARTITIONED BY(…) CLUSTERED BY(…) INTO … BUCKETS; > But in this case optimizer never chooses to use Bucket Map Join or Sort Merge > Bucket Map Join which defeats the purpose of creating BUCKETed tables in such > scenarios. Should that be raised as a separate BUG? > ⦿ Currently partitioning and bucketing are two separate things but serve same > purpose - shouldn't the concept be merged (explicit/implicit partitions?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)