[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103427#comment-13103427 ] Hudson commented on HIVE-1694: -- Integrated in Hive-trunk-h0.21 #949 (See [https://builds.apache.org/job/Hive-trunk-h0.21/949/]) HIVE-1694. Accelerate GROUP BY execution using indexes (Prajakta Kalmegh via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1170007 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/data/files/lineitem.txt * /hive/trunk/data/files/tbl.txt * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java * /hive/trunk/ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q * /hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Fix For: 0.9.0 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102988#comment-13102988 ] John Sichi commented on HIVE-1694: -- Prajakta, can you re-attach your latest patch granting rights to ASF (so the feather shows up next to the attachment), and then click the Submit Patch button? Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103013#comment-13103013 ] John Sichi commented on HIVE-1694: -- +1. Will commit when tests pass. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Fix For: 0.8.0 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102149#comment-13102149 ] jirapos...@reviews.apache.org commented on HIVE-1694: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/ --- (Updated 2011-09-10 21:10:06.178279) Review request for hive and John Sichi. Changes --- Added order-by to queries for test determinism. Summary --- This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be data/files/lineitem.txt PRE-CREATION data/files/tbl.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 5053576 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7a00c00 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dcdfb9e ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java 699519b ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1194/diff Testing --- Thanks, Prajakta Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101606#comment-13101606 ] John Sichi commented on HIVE-1694: -- Looks great. One last change: for all the SELECT queries in the .q file, can you add an ORDER BY on a full key for test determinism. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100839#comment-13100839 ] jirapos...@reviews.apache.org commented on HIVE-1694: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/ --- (Updated 2011-09-09 01:14:16.218940) Review request for hive and John Sichi. Summary --- This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs (updated) - ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java 699519b ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dcdfb9e ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66ee0be data/files/lineitem.txt PRE-CREATION data/files/tbl.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 5053576 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7a00c00 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bec8787 ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION Diff: https://reviews.apache.org/r/1194/diff Testing --- Thanks, Prajakta Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098986#comment-13098986 ] jirapos...@reviews.apache.org commented on HIVE-1694: - bq. On 2011-08-05 21:20:21, John Sichi wrote: bq. ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 172 bq. https://reviews.apache.org/r/1194/diff/2/?file=30443#file30443line172 bq. bq. See recent changes in corresponding CompactIndexHandler code for HIVEOPTINDEXFILTER; need the same here (or better, factor out common code here and elsewhere). bq. bq. On a related note, you may be able to use the same technique instead of isQueryInsertToTable; this would be preferable since it's nice to be able to use the index rewrite in cases where it's a normal INSERT table with index being used for GROUP BY on SELECT from some other table. bq. I have factored out the common code in all Index handler classes and placed it in IndexUtils file. I also removed the code for isQueryInsertToTable and am setting the HIVEOPTINDEXFILTER to false instead. bq. On 2011-08-05 21:20:21, John Sichi wrote: bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java, line 153 bq. https://reviews.apache.org/r/1194/diff/2/?file=30449#file30449line153 bq. bq. Shouldn't this be the same as COUNT(*)? bq. Yes it is. I missed to change this part from the previous code. - Prajakta --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/#review1303 --- On 2011-08-03 10:31:42, Prajakta Kalmegh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1194/ bq. --- bq. bq. (Updated 2011-08-03 10:31:42) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. --- bq. bq. This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. bq. bq. bq. This addresses bug HIVE-1694. bq. https://issues.apache.org/jira/browse/HIVE-1694 bq. bq. bq. Diffs bq. - bq. bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java 8295687 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java 699519b bq.ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION bq.ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/1194/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Prajakta bq. bq. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080235#comment-13080235 ] jirapos...@reviews.apache.org commented on HIVE-1694: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/#review1303 --- ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/1194/#comment2955 Can't you just look up AGGREGATES in the map? ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/1194/#comment2953 Add a helper method to avoid duplicating the code in the else block below. ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/1194/#comment2954 Can't you just look up AGGREGATES in the map? ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/1194/#comment2956 See recent changes in corresponding CompactIndexHandler code for HIVEOPTINDEXFILTER; need the same here (or better, factor out common code here and elsewhere). On a related note, you may be able to use the same technique instead of isQueryInsertToTable; this would be preferable since it's nice to be able to use the index rewrite in cases where it's a normal INSERT table with index being used for GROUP BY on SELECT from some other table. ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java https://reviews.apache.org/r/1194/#comment2957 @params here don't match actual params ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java https://reviews.apache.org/r/1194/#comment2958 Shouldn't this be the same as COUNT(*)? ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q https://reviews.apache.org/r/1194/#comment2980 Besides EXPLAIN, you should include a few queries against a non-empty table verifying that you get the correct results both with and without the optimization applied. Remember to include an ORDER BY for test determinism. ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q https://reviews.apache.org/r/1194/#comment2978 Isn't this set redundant? - John On 2011-08-03 10:31:42, Prajakta Kalmegh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1194/ bq. --- bq. bq. (Updated 2011-08-03 10:31:42) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. --- bq. bq. This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. bq. bq. bq. This addresses bug HIVE-1694. bq. https://issues.apache.org/jira/browse/HIVE-1694 bq. bq. bq. Diffs bq. - bq. bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a57f9cf bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java 8295687 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java 699519b bq.ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION bq.ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/1194/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Prajakta bq. bq. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL:
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080236#comment-13080236 ] John Sichi commented on HIVE-1694: -- Added some comments on Review Board. Thanks for carrying this one all the way through such a long review process; let's see if we can get this one committed before it turns one year old :) Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075964#comment-13075964 ] jirapos...@reviews.apache.org commented on HIVE-1694: - bq. On 2011-07-28 21:40:30, John Sichi wrote: bq. ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 61 bq. https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61 bq. bq. Please run ant checkstyle and fix all the formatting discrepancies it reports for your new files. bq. bq. bq. Prajakta Kalmegh wrote: bq. Done! The code is still having checkstyle formatting errors only for places where we have used LinkedHashMap, HashMap and ArrayList. The error states Declaring variables, return values or parameters of type 'HashMap' is not allowed. Best practice is to only use interfaces (Map/List) except at the point of instantiation where you select a concrete class. Hive violates this in a number of places, and sometimes that forces you to violate it in new code too; but otherwise, please follow this one. bq. On 2011-07-28 21:40:30, John Sichi wrote: bq. ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 603 bq. https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603 bq. bq. Not sure why this new constructor is needed...after using it, all you do is get the table out of it. bq. bq. Prajakta Kalmegh wrote: bq. The only other constructor option for tableSpec needs the ASTNode as one of its parameters. Since we need to construct a new tableSpec using only the index table name, and we do not have a ASTNode for this, I need this constructor. If you have any other way in mind, please let me know. That would be helpful. I'm asking why you even need to construct a new tableSpec instance. All you do with it is reference ts.tableHandle. And to create that tableHandle, you can just do db.getTable(tableName). So I don't see the purpose of the tableSpec instance. - John --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/#review1212 --- On 2011-07-26 14:44:01, Prajakta Kalmegh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1194/ bq. --- bq. bq. (Updated 2011-07-26 14:44:01) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. --- bq. bq. This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. bq. bq. bq. This addresses bug HIVE-1694. bq. https://issues.apache.org/jira/browse/HIVE-1694 bq. bq. bq. Diffs bq. - bq. bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 bq.ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION bq.ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/1194/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Prajakta bq. bq. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt,
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072550#comment-13072550 ] jirapos...@reviews.apache.org commented on HIVE-1694: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/#review1212 --- ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/1194/#comment2711 Please run ant checkstyle and fix all the formatting discrepancies it reports for your new files. ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java https://reviews.apache.org/r/1194/#comment2695 Don't you need to reuse the compact implementation here so that the index can be used for WHERE (not just GROUP BY)? ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/1194/#comment2696 This method is redundant now. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java https://reviews.apache.org/r/1194/#comment2698 I can't think of a case where it would be worse. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java https://reviews.apache.org/r/1194/#comment2699 Actually group-by is now preserved in all cases. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java https://reviews.apache.org/r/1194/#comment2700 Please use HTML bullet syntax for Javadoc (otherwise it all gets run together into one line when rendered). ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java https://reviews.apache.org/r/1194/#comment2701 indentation ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java https://reviews.apache.org/r/1194/#comment2703 Shouldn't this be BIGINT? Also, I think you're supposed to use a TypeInfoFactory for this purpose. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java https://reviews.apache.org/r/1194/#comment2702 indentation ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java https://reviews.apache.org/r/1194/#comment2704 typo: Repace ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java https://reviews.apache.org/r/1194/#comment2707 Not sure why this new constructor is needed...after using it, all you do is get the table out of it. ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q https://reviews.apache.org/r/1194/#comment2709 This should *not* be using the index, since the index is built on count(l_shipdate), and l_shipdate may contain nulls, whereas the query is referencing count(1), which is insensitive to nulls. ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q https://reviews.apache.org/r/1194/#comment2710 Need additional tests to verify all the cases where the optimization should *not* be used: * when configuration disables it * when index partitions do not cover table partitions (I still don't see the code for this case) * ... all the other conditions checked for in the code ... - John On 2011-07-26 14:44:01, Prajakta Kalmegh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1194/ bq. --- bq. bq. (Updated 2011-07-26 14:44:01) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. --- bq. bq. This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. bq. bq. bq. This addresses bug HIVE-1694. bq. https://issues.apache.org/jira/browse/HIVE-1694 bq. bq. bq. Diffs bq. - bq. bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f bq.ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION bq.
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071130#comment-13071130 ] jirapos...@reviews.apache.org commented on HIVE-1694: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/ --- Review request for hive and John Sichi. Summary --- This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1194/diff Testing --- Thanks, Prajakta Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071135#comment-13071135 ] Prajakta Kalmegh commented on HIVE-1694: Hi John Please find attached the latest patch (HIVE-1694.4.patch): The patch contains: 1. Support for multiple aggregates in index creation using the AggregateIndexHandler. The column names for the index schema are constructed dynamically depending on the aggregates. For 'aggregateFunction(columnName)', the column name in index will be `_aggregateFunction_of_columnName`. For example, for count(l_shipdate), the column name will be `_count_of_l_shipdate)`. For 'count(*)' function, the column name will be `_count_of_all`. 2. Fixed the bug for duplicates in Group-by removal cases. We are not removing group-by in any case now. This has made the logic for query rewrites quite simpler than before. We removed 4 classes (RewriteIndexSubqueryCtx.java, RewriteIndexSubqueryProcFactory.java, RewriteRemoveGroupbyCtx.java, RewriteRemoveGroupbyProcFactory.java) from the previous patch and added two new simpler classes instead (RewriteQueryUsingAggregateIndex.java, RewriteQueryUsingAggregateIndexCtx.java). 3. Added a new query (with 'UNION ALL') in the same ql_rewrite_gbtoidx.q file to demonstrate your requirement in last post. Please note that the query is not a valid real-work use case scenario; but still suffices our purpose to see that one branch rewrite does not corrupt the other branch. 4. Rewrite Optimization now happens after the PredicatePushdown, PartitionPruner and PartitionConditionRemover. This patch does not contain: 1. Optimization for cases with mulitple aggregates in selection 2. Optimization for any other aggregate function apart from count 3. Optimization for queries involving multiple tables (even if they are in a different branch). Since we are not optimizing for case of joins, the constraint also filters out queries which have different tables in union queries. 4. Optimizations for index with multiple columns in its key Here is the review board link for the patch https://reviews.apache.org/r/1194/. Please let me know if you have any questions. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038958#comment-13038958 ] Prajakta Kalmegh commented on HIVE-1694: Thanks John. Our team here is working on it. I will let you know once the new patch is ready. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038104#comment-13038104 ] John Sichi commented on HIVE-1694: -- I collected comments from last week's review meeting below. * The rewrite needs to check to make sure that the index partitions are available (matching the referenced table partitions). You can take a look at the way the Harvey Mudd team handles this, and maybe reuse their code. This implies that predicate pushdown and partition pruning need to happen BEFORE the rewrite is applied (currently the rewrite happens before them). * Isn't it a bug that the GROUP BY is removed in some cases? The index may store multiple rows for the same base table key (since FILENAME is part of the index table key), so it seems like a GROUP BY should always be required for removing those duplicates. * Where is _countall used instead of _countkey? Also, what happens if the index is compound (multiple columns in its key)? * Add a test case for a query in which a table scan is reused in a directed acyclic graph, e.g. a UNION where one branch of the union does a rewritable GROUP BY on the table and the other branch just reads the table directly. We want to make sure that in this case, the rewrite's replacement of the base table in one branch does not corrupt the other branch in any way. After these have been addressed (along with the existing review board comments) and you've had a chance to rebase the patch, we'll do another pass. Thanks again! Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033906#comment-13033906 ] Prajakta Kalmegh commented on HIVE-1694: Thanks John. Please let us know how to proceed on this. We are taking a look at the HIVE-1803 changes in the meanwhile. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034181#comment-13034181 ] John Sichi commented on HIVE-1694: -- For the rebasing, you'll need to make your new handlers work with the refactored base classes. HIVE-1803 copied some of your refactoring and took it further. I'm going to ping Yongqiang again. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034207#comment-13034207 ] John Sichi commented on HIVE-1694: -- Scheduled a meeting this Friday to take a look at this with some other FB folks and get you more feedback. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034533#comment-13034533 ] Prajakta Kalmegh commented on HIVE-1694: That would be great. Thanks. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033254#comment-13033254 ] John Sichi commented on HIVE-1694: -- Need a rebase now that the refactoring from HIVE-1803 has been committed. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008235#comment-13008235 ] John Sichi commented on HIVE-1694: -- I added a few review board comments; there are a lot of places where the exception handling is still wrong; I didn't comment on all of those but they need to be fixed. We still need to reconcile with HIVE-1803, but I'll ask Namit and Yongqiang to take a look now to get their comments on the rewrite implementation. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006321#comment-13006321 ] Prajakta Kalmegh commented on HIVE-1694: Hi John Thanks for the link. We have created a new Review Board entry: https://reviews.apache.org/r/505/ Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005045#comment-13005045 ] Prajakta Kalmegh commented on HIVE-1694: Hi John Please find attached the patch with new index type support. We have also made changes to the our optimizer code to use count of indexed columns from this new index type (instead of computing the size(_offsets)). Can you please upload it for review on ReviewBoard? Thanks. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005461#comment-13005461 ] John Sichi commented on HIVE-1694: -- Hi Prajakta, Review Board is self-service...you can create yourself an account and then follow the steps here: http://wiki.apache.org/hadoop/Hive/HowToContribute#Review_Process Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002017#comment-13002017 ] Prajakta Kalmegh commented on HIVE-1694: Hi John We have made all the changes as suggested by you except for making the code pluggable (so that the rewrite expression changes depending on which index handler is used). We will submit this change along with the patch for new index type. We have started working on the new index type creation as per your suggestion and will let you know once that is complete. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000490#comment-13000490 ] John Sichi commented on HIVE-1694: -- I'd like to propose a fourth option instead: create a new handler type which stores both the count and the offsets together, so that it can be used for both aggregation and filtering. The index build can still be done with a single GROUP BY, but now with three aggregate expressions in the SELECT list: collect_set (BLOCKOFFSETINSIDEFILE), COUNT(`l_shipdate`), COUNT(*). For a column known to be NOT NULL, just COUNT(*) is good enough, but Hive doesn't currently have that metadata. You could also use IDXPROPERTIES to allow for additional expressions (SUM/MAX/MIN, complex expressions, etc), making these start to look more like materialized aggregate views. In HIVE-1803, they are working on factoring out some of the generic parts of compact index handler for reuse; we should depend on that for the aggregate index handler to avoid duplicating code. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1362#comment-1362 ] John Sichi commented on HIVE-1694: -- First round of review comments are here: https://reviews.apache.org/r/392/ After those are resolved, and patch is rebased, I'll ask Namit and Yongqiang to take a look to see if they can find ways to simplify any of the rewrite logic. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000144#comment-13000144 ] Prajakta Kalmegh commented on HIVE-1694: Hi John, Thanks for the review comments. I have posted my replies on some of your questions. For the others, we will make the required changes in the code and upload a new patch. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998353#comment-12998353 ] Prajakta Kalmegh commented on HIVE-1694: Hi John, Can you please let us know the status of the review of this patch? Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998619#comment-12998619 ] John Sichi commented on HIVE-1694: -- Hi Prajakta, I got caught up with some other work; I'll try to publish my comments before next week. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998659#comment-12998659 ] Prajakta Kalmegh commented on HIVE-1694: Thanks John. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995613#comment-12995613 ] John Sichi commented on HIVE-1694: -- Note: I pointed the Harvey Mudd team over to your branch, so they're copying bits and pieces of necessary support into their patch. Once they're a little further along, we can figure out how to reconcile the two before commit. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993742#comment-12993742 ] John Sichi commented on HIVE-1694: -- Taking a closer look at this one now. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993803#comment-12993803 ] John Sichi commented on HIVE-1694: -- I'm buffering up a bunch of comments in review board, but won't publish for a while since it'll take me some time to go through all the code. Looks good so far. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990466#comment-12990466 ] Prajakta Kalmegh commented on HIVE-1694: Thanks John. We will ensure that henceforth. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira