[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070943#comment-13070943
 ] 

Hudson commented on HIVE-2128:
--

Integrated in Hive-trunk-h0.21 #848 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/848/])
HIVE-2128. Automatic Indexing with multiple tables.
(Syed Albiz via jvs)

jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1150962
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
* 
/hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out
* 
/hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
* /hive/trunk/ql/src/test/queries/clientpositive/index_auto_self_join.q
* 
/hive/trunk/ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
* /hive/trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
* /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out
* /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables.q


 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Fix For: 0.8.0

 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, 
 HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch, HIVE-2128.7.patch, 
 HIVE-2128.8.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069851#comment-13069851
 ] 

John Sichi commented on HIVE-2128:
--

+1.  Will commit when tests pass.


 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, 
 HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-21 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069299#comment-13069299
 ] 

jirapos...@reviews.apache.org commented on HIVE-2128:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
---

(Updated 2011-07-21 23:52:23.929900)


Review request for hive and John Sichi.


Changes
---

Added order by to testcases. This revealed an existing bug where we would walk 
the entire operator tree for each task in the task tree in 
IndexWhereTaskDispatcher. I amended this to only walk the subset of the 
operator tree in the current task.


Summary
---

Grab the indexed tables during optimized query generation, grab the associated 
path URIs, and keep those around in the Configuration object. When the job is 
passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat 
to decide whether to use the index file or delegate to the parent 
(HiveInputFormat) class. Not sure if this is robust. 


This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128


Diffs (updated)
-

  ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
4c9efd1 
  ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
f1ee95d 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
61bbbf5 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7c91946 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dbc489f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 da084f6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 
  ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/1010/diff


Testing
---

added new testcase index_auto_mult_tables.q


Thanks,

Syed



 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, 
 HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-19 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067878#comment-13067878
 ] 

jirapos...@reviews.apache.org commented on HIVE-2128:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/#review1112
---



ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java
https://reviews.apache.org/r/1010/#comment2271

Why was this comment truncated?



ql/src/test/queries/clientpositive/index_auto_mult_tables.q
https://reviews.apache.org/r/1010/#comment2273

All of these SELECT statements need ORDER BY for determinism.


- John


On 2011-07-19 03:15:17, Syed Albiz wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1010/
bq.  ---
bq.  
bq.  (Updated 2011-07-19 03:15:17)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Grab the indexed tables during optimized query generation, grab the 
associated path URIs, and keep those around in the Configuration object. When 
the job is passed to ExecDriver, this data is extracted and used in 
HiveIndexedInputFormat to decide whether to use the index file or delegate to 
the parent (HiveInputFormat) class. Not sure if this is robust. 
bq.  
bq.  
bq.  This addresses bug HIVE-2128.
bq.  https://issues.apache.org/jira/browse/HIVE-2128
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.ql/src/test/results/clientpositive/index_auto_self_join.q.out 
PRE-CREATION 
bq.ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 
PRE-CREATION 
bq.ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION 
bq.ql/src/test/results/clientpositive/index_auto_mult_tables.q.out 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 
bq.ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION 
bq.ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q 
PRE-CREATION 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
f1ee95d 
bq.
ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
61bbbf5 
bq.
ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7c91946 
bq.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dbc489f 
bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 
617723e 
bq.  
bq.  Diff: https://reviews.apache.org/r/1010/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  added new testcase index_auto_mult_tables.q
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Syed
bq.  
bq.



 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, 
 HIVE-2128.4.patch, HIVE-2128.5.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-18 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067398#comment-13067398
 ] 

John Sichi commented on HIVE-2128:
--

Could you make sure the latest patch is uploaded here and matching Review 
Board, and then click Submit Patch?  Also make sure all spurious changes (like 
extra imports) are gone; I'm seeing some of those in Review Board.

 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-12 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064253#comment-13064253
 ] 

jirapos...@reviews.apache.org commented on HIVE-2128:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
---

(Updated 2011-07-13 00:29:56.738368)


Review request for hive and John Sichi.


Changes
---

Revamped approach. We already uniquely assign filenames to each index query 
result, so instead of throwing those away, keep them in the 
indexIntermediateFile variable, and take the union of those input paths to 
generate the next set of input splits.


Summary
---

Grab the indexed tables during optimized query generation, grab the associated 
path URIs, and keep those around in the Configuration object. When the job is 
passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat 
to decide whether to use the index file or delegate to the parent 
(HiveInputFormat) class. Not sure if this is robust. 


This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128


Diffs (updated)
-

  ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 
617723e 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
f1ee95d 
  ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
02ab78c 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
61bbbf5 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7c91946 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dbc489f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 
  ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1010/diff


Testing
---

added new testcase index_auto_mult_tables.q


Thanks,

Syed



 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-06 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060888#comment-13060888
 ] 

John Sichi commented on HIVE-2128:
--

I was thinking of the case of compact indexes (one on each table).

Your test case is similar, but for bitmap indexes.  We certainly should not be 
trying to combine the indexes in this case since they are on different tables!  
The plan looks strange already because it is applying the srcpart predicate 
twice, and the src index not at all.  (It's hard to tell what's going on since 
the same predicate is applied on both tables; use a different predicate to see 
if it's two copies of the same vs one of each.)

Regardless of index type, I think we *should* be able to use indexes on 
different tables at once in the same query.


 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-07-05 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060075#comment-13060075
 ] 

jirapos...@reviews.apache.org commented on HIVE-2128:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
---

Review request for hive and John Sichi.


Summary
---

Grab the indexed tables during optimized query generation, grab the associated 
path URIs, and keep those around in the Configuration object. When the job is 
passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat 
to decide whether to use the index file or delegate to the parent 
(HiveInputFormat) class. Not sure if this is robust. 


This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 
617723e 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
f1ee95d 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
61bbbf5 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
7c91946 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 dbc489f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 

Diff: https://reviews.apache.org/r/1010/diff


Testing
---

added new testcase index_auto_mult_tables.q


Thanks,

Syed



 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
 Attachments: HIVE-2128.1.patch


 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables

2011-06-14 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049555#comment-13049555
 ] 

John Sichi commented on HIVE-2128:
--

HiveInputFormat already keeps track of the mapping from path to input format.  
So the idea here is that instead of setting HiveIndexedInputFormat globally for 
the entire job, we need to be associating it only with the paths that are 
supposed to have index filtering applied.


 Automatic Indexing with multiple tables
 ---

 Key: HIVE-2128
 URL: https://issues.apache.org/jira/browse/HIVE-2128
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick

 Make automatic indexing work with jobs which access multiple tables.  We'll 
 probably need to modify the way that the index input format works in order to 
 associate index formats/files with specific tables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira