[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103427#comment-13103427
 ] 

Hudson commented on HIVE-1694:
--

Integrated in Hive-trunk-h0.21 #949 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/949/])
HIVE-1694. Accelerate GROUP BY execution using indexes
(Prajakta Kalmegh via jvs)

jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1170007
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/data/files/lineitem.txt
* /hive/trunk/data/files/tbl.txt
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
* /hive/trunk/ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
* /hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Fix For: 0.9.0

 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102988#comment-13102988
 ] 

John Sichi commented on HIVE-1694:
--

Prajakta, can you re-attach your latest patch granting rights to ASF (so the 
feather shows up next to the attachment), and then click the Submit Patch 
button?

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, 
 demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103013#comment-13103013
 ] 

John Sichi commented on HIVE-1694:
--

+1.  Will commit when tests pass.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Fix For: 0.8.0

 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-10 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102149#comment-13102149
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-09-10 21:10:06.178279)


Review request for hive and John Sichi.


Changes
---

Added order-by to queries for test determinism.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be 
  data/files/lineitem.txt PRE-CREATION 
  data/files/tbl.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
5053576 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7a00c00 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dcdfb9e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, 
 demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-09 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101606#comment-13101606
 ] 

John Sichi commented on HIVE-1694:
--

Looks great.  One last change:  for all the SELECT queries in the .q file, can 
you add an ORDER BY on a full key for test determinism.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-08 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100839#comment-13100839
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

(Updated 2011-09-09 01:14:16.218940)


Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs (updated)
-

  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dcdfb9e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be 
  data/files/lineitem.txt PRE-CREATION 
  data/files/tbl.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
5053576 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7a00c00 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-07 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098986#comment-13098986
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-



bq.  On 2011-08-05 21:20:21, John Sichi wrote:
bq.   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
line 172
bq.   https://reviews.apache.org/r/1194/diff/2/?file=30443#file30443line172
bq.  
bq.   See recent changes in corresponding CompactIndexHandler code for 
HIVEOPTINDEXFILTER; need the same here (or better, factor out common code here 
and elsewhere).
bq.   
bq.   On a related note, you may be able to use the same technique instead 
of isQueryInsertToTable; this would be preferable since it's nice to be able to 
use the index rewrite in cases where it's a normal INSERT table with index 
being used for GROUP BY on SELECT from some other table.
bq.  

I have factored out the common code in all Index handler classes and placed it 
in IndexUtils file. 

I also removed the code for isQueryInsertToTable and am setting the 
HIVEOPTINDEXFILTER to false instead. 


bq.  On 2011-08-05 21:20:21, John Sichi wrote:
bq.   
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
 line 153
bq.   https://reviews.apache.org/r/1194/diff/2/?file=30449#file30449line153
bq.  
bq.   Shouldn't this be the same as COUNT(*)?
bq.  

Yes it is. I missed to change this part from the previous code.


- Prajakta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1303
---


On 2011-08-03 10:31:42, Prajakta Kalmegh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1194/
bq.  ---
bq.  
bq.  (Updated 2011-08-03 10:31:42)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch has defined a new AggregateIndexHandler which is used to 
optimize the query plan for groupby queries. 
bq.  
bq.  
bq.  This addresses bug HIVE-1694.
bq.  https://issues.apache.org/jira/browse/HIVE-1694
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf 
bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 8295687 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
bq.ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
bq.ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1194/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prajakta
bq.  
bq.



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-08-05 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080235#comment-13080235
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1303
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2955

Can't you just look up AGGREGATES in the map?



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2953

Add a helper method to avoid duplicating the code in the else block below.




ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2954

Can't you just look up AGGREGATES in the map?



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2956

See recent changes in corresponding CompactIndexHandler code for 
HIVEOPTINDEXFILTER; need the same here (or better, factor out common code here 
and elsewhere).

On a related note, you may be able to use the same technique instead of 
isQueryInsertToTable; this would be preferable since it's nice to be able to 
use the index rewrite in cases where it's a normal INSERT table with index 
being used for GROUP BY on SELECT from some other table.




ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java
https://reviews.apache.org/r/1194/#comment2957

@params here don't match actual params



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/1194/#comment2958

Shouldn't this be the same as COUNT(*)?




ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2980

Besides EXPLAIN, you should include a few queries against a non-empty table 
verifying that you get the correct results both with and without the 
optimization applied.  Remember to include an ORDER BY for test determinism.




ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2978

Isn't this set redundant?


- John


On 2011-08-03 10:31:42, Prajakta Kalmegh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1194/
bq.  ---
bq.  
bq.  (Updated 2011-08-03 10:31:42)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch has defined a new AggregateIndexHandler which is used to 
optimize the query plan for groupby queries. 
bq.  
bq.  
bq.  This addresses bug HIVE-1694.
bq.  https://issues.apache.org/jira/browse/HIVE-1694
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf 
bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 8295687 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 699519b 
bq.ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
bq.ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1194/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prajakta
bq.  
bq.



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: 

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-08-05 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080236#comment-13080236
 ] 

John Sichi commented on HIVE-1694:
--

Added some comments on Review Board.  Thanks for carrying this one all the way 
through such a long review process; let's see if we can get this one committed 
before it turns one year old :)


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-08-01 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075964#comment-13075964
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-



bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
line 61
bq.   https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61
bq.  
bq.   Please run ant checkstyle and fix all the formatting discrepancies 
it reports for your new files.
bq.  
bq.  
bq.  Prajakta Kalmegh wrote:
bq.  Done! The code is still having checkstyle formatting errors only for 
places where we have used LinkedHashMap, HashMap and ArrayList. The error 
states Declaring variables, return values or parameters of type 'HashMap' is 
not allowed.

Best practice is to only use interfaces (Map/List) except at the point of 
instantiation where you select a concrete class.  Hive violates this in a 
number of places, and sometimes that forces you to violate it in new code too; 
but otherwise, please follow this one.


bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, 
line 603
bq.   https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603
bq.  
bq.   Not sure why this new constructor is needed...after using it, all 
you do is get the table out of it.
bq.  
bq.  Prajakta Kalmegh wrote:
bq.  The only other constructor option for tableSpec needs the ASTNode as 
one of its parameters. Since we need to construct a new tableSpec using only 
the index table name, and we do not have a ASTNode for this, I need this 
constructor. If you have any other way in mind, please let me know. That would 
be helpful.

I'm asking why you even need to construct a new tableSpec instance.  All you do 
with it is reference ts.tableHandle.  And to create that tableHandle, you can 
just do db.getTable(tableName).  So I don't see the purpose of the tableSpec 
instance.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
---


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1194/
bq.  ---
bq.  
bq.  (Updated 2011-07-26 14:44:01)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch has defined a new AggregateIndexHandler which is used to 
optimize the query plan for groupby queries. 
bq.  
bq.  
bq.  This addresses bug HIVE-1694.
bq.  https://issues.apache.org/jira/browse/HIVE-1694
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
77a6dc6 
bq.ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
bq.ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1194/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prajakta
bq.  
bq.



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, 

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-28 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072550#comment-13072550
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
---



ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2711

Please run ant checkstyle and fix all the formatting discrepancies it 
reports for your new files.




ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
https://reviews.apache.org/r/1194/#comment2695

Don't you need to reuse the compact implementation here so that the index 
can be used for WHERE (not just GROUP BY)?




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/1194/#comment2696

This method is redundant now.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/1194/#comment2698

I can't think of a case where it would be worse.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
https://reviews.apache.org/r/1194/#comment2699

Actually group-by is now preserved in all cases.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
https://reviews.apache.org/r/1194/#comment2700

Please use HTML bullet syntax for Javadoc (otherwise it all gets run 
together into one line when rendered).




ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
https://reviews.apache.org/r/1194/#comment2701

indentation



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2703

Shouldn't this be BIGINT?

Also, I think you're supposed to use a TypeInfoFactory for this purpose.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2702

indentation



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
https://reviews.apache.org/r/1194/#comment2704

typo:  Repace



ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
https://reviews.apache.org/r/1194/#comment2707

Not sure why this new constructor is needed...after using it, all you do is 
get the table out of it.



ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2709

This should *not* be using the index, since the index is built on 
count(l_shipdate), and l_shipdate may contain nulls, whereas the query is 
referencing count(1), which is insensitive to nulls.



ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
https://reviews.apache.org/r/1194/#comment2710

Need additional tests to verify all the cases where the optimization should 
*not* be used:

* when configuration disables it
* when index partitions do not cover table partitions (I still don't see 
the code for this case)
* ... all the other conditions checked for in the code ...



- John


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1194/
bq.  ---
bq.  
bq.  (Updated 2011-07-26 14:44:01)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch has defined a new AggregateIndexHandler which is used to 
optimize the query plan for groupby queries. 
bq.  
bq.  
bq.  This addresses bug HIVE-1694.
bq.  https://issues.apache.org/jira/browse/HIVE-1694
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
bq.

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071130#comment-13071130
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071135#comment-13071135
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John

Please find attached the latest patch (HIVE-1694.4.patch):
The patch contains:
1. Support for multiple aggregates in index creation using the 
AggregateIndexHandler. The column names for the index schema are constructed 
dynamically depending on the aggregates. 
For 'aggregateFunction(columnName)', the column name in index will be 
`_aggregateFunction_of_columnName`. 
For example, for count(l_shipdate), the column name will be 
`_count_of_l_shipdate)`.
For 'count(*)' function, the column name will be `_count_of_all`.

2. Fixed the bug for duplicates in Group-by removal cases. We are not removing 
group-by in any case now. This has made the logic for query rewrites quite 
simpler than before. 
We removed 4 classes (RewriteIndexSubqueryCtx.java, 
RewriteIndexSubqueryProcFactory.java, RewriteRemoveGroupbyCtx.java, 
RewriteRemoveGroupbyProcFactory.java) from the previous patch  and added two 
new simpler classes instead (RewriteQueryUsingAggregateIndex.java, 
RewriteQueryUsingAggregateIndexCtx.java). 

3. Added a new query (with 'UNION ALL') in the same ql_rewrite_gbtoidx.q file 
to demonstrate your requirement in last post. Please  note that the query is 
not a valid real-work use case scenario; but still suffices our purpose to see 
that one branch rewrite does not corrupt the other branch.

4. Rewrite Optimization now happens after the PredicatePushdown, 
PartitionPruner and PartitionConditionRemover.

This patch does not contain:
1. Optimization for cases with mulitple aggregates in selection
2. Optimization for any other aggregate function apart from count
3. Optimization for queries involving multiple tables (even if they are in a 
different branch). Since we are not optimizing for case of joins, the 
constraint also filters out queries which have different tables in union 
queries.
4. Optimizations for index with multiple columns in its key

Here is the review board link for the patch 
https://reviews.apache.org/r/1194/.

Please let me know if you have any questions.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-24 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038958#comment-13038958
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Thanks John. Our team here is working on it. I will let you know once the new 
patch is ready.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038104#comment-13038104
 ] 

John Sichi commented on HIVE-1694:
--

I collected comments from last week's review meeting below.

* The rewrite needs to check to make sure that the index partitions are 
available (matching the referenced table partitions).  You can take a look at 
the way the Harvey Mudd team handles this, and maybe reuse their code.  This 
implies that predicate pushdown and partition pruning need to happen BEFORE the 
rewrite is applied (currently the rewrite happens before them).

* Isn't it a bug that the GROUP BY is removed in some cases?  The index may 
store multiple rows for the same base table key (since FILENAME is part of the 
index table key), so it seems like a GROUP BY should always be required for 
removing those duplicates.

* Where is _countall used instead of _countkey?  Also, what happens if the 
index is compound (multiple columns in its key)?

* Add a test case for a query in which a table scan is reused in a directed 
acyclic graph, e.g. a UNION where one branch of the union does a rewritable 
GROUP BY on the table and the other branch just reads the table directly.  We 
want to make sure that in this case, the rewrite's replacement of the base 
table in one branch does not corrupt the other branch in any way.

After these have been addressed (along with the existing review board comments) 
and you've had a chance to rebase the patch, we'll do another pass.

Thanks again!


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-16 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033906#comment-13033906
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Thanks John. Please let us know how to proceed on this. We are taking a look at 
the HIVE-1803 changes in the meanwhile.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034181#comment-13034181
 ] 

John Sichi commented on HIVE-1694:
--

For the rebasing, you'll need to make your new handlers work with the 
refactored base classes.  HIVE-1803 copied some of your refactoring and took it 
further.

I'm going to ping Yongqiang again.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034207#comment-13034207
 ] 

John Sichi commented on HIVE-1694:
--

Scheduled a meeting this Friday to take a look at this with some other FB folks 
and get you more feedback.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-16 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034533#comment-13034533
 ] 

Prajakta Kalmegh commented on HIVE-1694:


That would be great. Thanks.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-05-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033254#comment-13033254
 ] 

John Sichi commented on HIVE-1694:
--

Need a rebase now that the refactoring from HIVE-1803 has been committed.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008235#comment-13008235
 ] 

John Sichi commented on HIVE-1694:
--

I added a few review board comments; there are a lot of places where the 
exception handling is still wrong; I didn't comment on all of those but they 
need to be fixed.

We still need to reconcile with HIVE-1803, but I'll ask Namit and Yongqiang to 
take a look now to get their comments on the rewrite implementation.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-13 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006321#comment-13006321
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John

Thanks for the link. We have created a new Review Board entry: 
https://reviews.apache.org/r/505/

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-10 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005045#comment-13005045
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John

Please find attached the patch with new index type support. We have also made 
changes to the our optimizer code to use count of indexed columns from this new 
index type (instead of computing the size(_offsets)). Can you please upload it 
for review on ReviewBoard?

Thanks.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-10 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005461#comment-13005461
 ] 

John Sichi commented on HIVE-1694:
--

Hi Prajakta,

Review Board is self-service...you can create yourself an account and then 
follow the steps here:

http://wiki.apache.org/hadoop/Hive/HowToContribute#Review_Process


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-03 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002017#comment-13002017
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John

We have made all the changes as suggested by you except for making the code 
pluggable (so that the rewrite expression changes depending on which index 
handler is used). We will submit this change along with the patch for new index 
type. 

We have started working on the new index type creation as per your suggestion 
and will let you know once that is complete. 

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000490#comment-13000490
 ] 

John Sichi commented on HIVE-1694:
--

I'd like to propose a fourth option instead:  create a new handler type which 
stores both the count and the offsets together, so that it can be used for both 
aggregation and filtering.  The index build can still be done with a single 
GROUP BY, but now with three aggregate expressions in the SELECT list:  
collect_set (BLOCKOFFSETINSIDEFILE), COUNT(`l_shipdate`), COUNT(*).  For a 
column known to be NOT NULL, just COUNT(*) is good enough, but Hive doesn't 
currently have that metadata.  You could also use IDXPROPERTIES to allow for 
additional expressions (SUM/MAX/MIN, complex expressions, etc), making these 
start to look more like materialized aggregate views.

In HIVE-1803, they are working on factoring out some of the generic parts of 
compact index handler for reuse; we should depend on that for the aggregate 
index handler to avoid duplicating code.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-27 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1362#comment-1362
 ] 

John Sichi commented on HIVE-1694:
--

First round of review comments are here:

https://reviews.apache.org/r/392/

After those are resolved, and patch is rebased, I'll ask Namit and Yongqiang to 
take a look to see if they can find ways to simplify any of the rewrite logic.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-27 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000144#comment-13000144
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John, 

Thanks for the review comments. I have posted my replies on some of your 
questions. For the others, we will make the required changes in the code and 
upload a new patch. 

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-23 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998353#comment-12998353
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John,

Can you please let us know the status of the review of this patch? 

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998619#comment-12998619
 ] 

John Sichi commented on HIVE-1694:
--

Hi Prajakta,

I got caught up with some other work; I'll try to publish my comments before 
next week.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-23 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998659#comment-12998659
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Thanks John.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995613#comment-12995613
 ] 

John Sichi commented on HIVE-1694:
--

Note:  I pointed the Harvey Mudd team over to your branch, so they're copying 
bits and pieces of necessary support into their patch.  Once they're a little 
further along, we can figure out how to reconcile the two before commit.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993742#comment-12993742
 ] 

John Sichi commented on HIVE-1694:
--

Taking a closer look at this one now.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993803#comment-12993803
 ] 

John Sichi commented on HIVE-1694:
--

I'm buffering up a bunch of comments in review board, but won't publish for a 
while since it'll take me some time to go through all the code.  Looks good so 
far.



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-02-03 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990466#comment-12990466
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Thanks John. We will ensure that henceforth. 

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira