[jira] [Resolved] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved HIVE-3652. --- Resolution: Duplicate Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-3652-tests.patch, HIVE-3652-tests.patch Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K resolved HIVE-3652. -- Resolution: Duplicate Fix Version/s: 0.11.0 The work required for this jira is fixed as part of de-emphasizing of map-join work done in HIVE-3784. The query {format}select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on b.k2=c.k2{format} runs in 1 MR job (based on the noConditionalTask.size). Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira