[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070943#comment-13070943 ] Hudson commented on HIVE-2128: -- Integrated in Hive-trunk-h0.21 #848 (See [https://builds.apache.org/job/Hive-trunk-h0.21/848/]) HIVE-2128. Automatic Indexing with multiple tables. (Syed Albiz via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1150962 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java * /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out * /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java * /hive/trunk/ql/src/test/queries/clientpositive/index_auto_self_join.q * /hive/trunk/ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java * /hive/trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java * /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out * /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables.q Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Fix For: 0.8.0 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch, HIVE-2128.7.patch, HIVE-2128.8.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069851#comment-13069851 ] John Sichi commented on HIVE-2128: -- +1. Will commit when tests pass. Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069299#comment-13069299 ] jirapos...@reviews.apache.org commented on HIVE-2128: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ --- (Updated 2011-07-21 23:52:23.929900) Review request for hive and John Sichi. Changes --- Added order by to testcases. This revealed an existing bug where we would walk the entire operator tree for each task in the task tree in IndexWhereTaskDispatcher. I amended this to only walk the subset of the operator tree in the current task. Summary --- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128. https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) - ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 4c9efd1 ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java da084f6 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1010/diff Testing --- added new testcase index_auto_mult_tables.q Thanks, Syed Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067878#comment-13067878 ] jirapos...@reviews.apache.org commented on HIVE-2128: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/#review1112 --- ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java https://reviews.apache.org/r/1010/#comment2271 Why was this comment truncated? ql/src/test/queries/clientpositive/index_auto_mult_tables.q https://reviews.apache.org/r/1010/#comment2273 All of these SELECT statements need ORDER BY for determinism. - John On 2011-07-19 03:15:17, Syed Albiz wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1010/ bq. --- bq. bq. (Updated 2011-07-19 03:15:17) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. --- bq. bq. Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. bq. bq. bq. This addresses bug HIVE-2128. bq. https://issues.apache.org/jira/browse/HIVE-2128 bq. bq. bq. Diffs bq. - bq. bq.ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION bq.ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION bq.ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION bq.ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 bq.ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION bq.ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d bq. ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 bq. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e bq. bq. Diff: https://reviews.apache.org/r/1010/diff bq. bq. bq. Testing bq. --- bq. bq. added new testcase index_auto_mult_tables.q bq. bq. bq. Thanks, bq. bq. Syed bq. bq. Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067398#comment-13067398 ] John Sichi commented on HIVE-2128: -- Could you make sure the latest patch is uploaded here and matching Review Board, and then click Submit Patch? Also make sure all spurious changes (like extra imports) are gone; I'm seeing some of those in Review Board. Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064253#comment-13064253 ] jirapos...@reviews.apache.org commented on HIVE-2128: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ --- (Updated 2011-07-13 00:29:56.738368) Review request for hive and John Sichi. Changes --- Revamped approach. We already uniquely assign filenames to each index query result, so instead of throwing those away, keep them in the indexIntermediateFile variable, and take the union of those input paths to generate the next set of input splits. Summary --- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128. https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) - ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 02ab78c ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1010/diff Testing --- added new testcase index_auto_mult_tables.q Thanks, Syed Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060888#comment-13060888 ] John Sichi commented on HIVE-2128: -- I was thinking of the case of compact indexes (one on each table). Your test case is similar, but for bitmap indexes. We certainly should not be trying to combine the indexes in this case since they are on different tables! The plan looks strange already because it is applying the srcpart predicate twice, and the src index not at all. (It's hard to tell what's going on since the same predicate is applied on both tables; use a different predicate to see if it's two copies of the same vs one of each.) Regardless of index type, I think we *should* be able to use indexes on different tables at once in the same query. Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060075#comment-13060075 ] jirapos...@reviews.apache.org commented on HIVE-2128: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ --- Review request for hive and John Sichi. Summary --- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128. https://issues.apache.org/jira/browse/HIVE-2128 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 Diff: https://reviews.apache.org/r/1010/diff Testing --- added new testcase index_auto_mult_tables.q Thanks, Syed Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Attachments: HIVE-2128.1.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049555#comment-13049555 ] John Sichi commented on HIVE-2128: -- HiveInputFormat already keeps track of the mapping from path to input format. So the idea here is that instead of setting HiveIndexedInputFormat globally for the entire job, we need to be associating it only with the paths that are supposed to have index filtering applied. Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira