[jira] [Resolved] (HIVE-3652) Join optimization for star schema

2013-02-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved HIVE-3652.
---

Resolution: Duplicate

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0

 Attachments: HIVE-3652-tests.patch, HIVE-3652-tests.patch


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K resolved HIVE-3652.
--

   Resolution: Duplicate
Fix Version/s: 0.11.0

The work required for this jira is fixed as part of de-emphasizing of map-join 
work done in HIVE-3784. The query 

{format}select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN 
DIM2 c on b.k2=c.k2{format}

runs in 1 MR job (based on the noConditionalTask.size).

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira